Abstract
Free full text
Genetic screens in human cells using the CRISPR/Cas9 system
Abstract
The bacterial CRISPR/Cas9 system for genome editing has greatly expanded the toolbox for mammalian genetics, enabling the rapid generation of isogenic cell lines and mice with modified alleles. Here, we describe a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single guide RNA (sgRNA) library. sgRNA expression cassettes were stably integrated into the genome, which enabled a complex mutant pool to be tracked by massively parallel sequencing. We used a library containing 73,000 sgRNAs to generate knockout collections and performed screens in two human cell lines. A screen for resistance to the nucleotide analog 6-thioguanine identified all expected members of the DNA mismatch repair pathway, while another for the DNA topoisomerase II (TOP2A) poison etoposide identified TOP2A, as expected, and also cyclin-dependent kinase 6, CDK6. A negative selection screen for essential genes identified numerous gene sets corresponding to fundamental processes. Finally, we show that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs. Collectively, these results establish Cas9/sgRNA screens as a powerful tool for systematic genetic analysis in mammalian cells.
A critical need in biology is the ability to efficiently identify the set of genes underlying a cellular process. In microorganisms, powerful methods allow systematic loss-of-function genetic screening (1, 2). In mammalian cells, however, current screening methods fall short – primarily because of the difficulty of inactivating both copies of a gene in a diploid mammalian cell. Insertional mutagenesis screens in cell lines that are near-haploid or carry Blm mutations, that cause frequent somatic crossing-over, have proven powerful but are not applicable to most cell lines and suffer from integration biases of the insertion vectors (3, 4). The primary solution has been to target mRNAs with RNA interference (RNAi) (5–9). However, this approach is also imperfect as it only partially suppresses target gene levels and can have off-target effects on other mRNAs – resulting in false negative and false positive results (10–12). Thus, there remains an unmet need for an efficient, large-scale, loss of function screening method in mammalian cells
Recently, the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) pathway, which functions as an adaptive immune system in bacteria (13), has been co-opted to engineer mammalian genomes in an efficient manner (14–16). In this two-component system, a single guide RNA (sgRNA) directs the Cas9 nuclease to cause double-stranded cleavage of matching target DNA sequences (17). In contrast to previous genome-editing techniques, such as zinc-finger nucleases and TALENs, the target specificity of CRISPR/Cas9 is dictated by a 20-base pair sequence at the 5′-end of the sgRNA, allowing for much greater ease of construction of knockout reagents. Mutant cells lines and mice bearing multiple modified alleles can be generated with this technology (18, 19).
We set out to explore the feasibility of using the CRISPR/Cas9 system to perform large-scale, loss-of-function screens in mammalian cells. The idea was to use a pool of sgRNA-expressing lentivirus to generate a library of knockout cells that could be screened under both positive and negative selection. Each sgRNA would serve as a distinct DNA barcode that can be used to count the number of cells carrying it using high-throughput sequencing (Fig. 1A). Pooled screening requires that single-copy sgRNA integrants are sufficient to induce efficient cleavage of both copies of a targeted locus. This contrasts with the high expression of sgRNAs achieved by transfection that is typically used to engineer a specific genomic change using the CRISPR/Cas9 system.
We first tested the concept in the near-haploid, human KBM7 CML cell line, by creating a clonal derivative expressing the Cas9 nuclease (with a FLAG-tag at its N-terminus) under a doxycycline-inducible promoter (Fig. 1B). Transduction of these cells at low multiplicity of infection (MOI) with a lentivirus expressing a sgRNA targeting the endogenous AAVS1 locus revealed substantial cleavage at the AAVS1 locus 48 hours after infection (Fig. 1C). Moreover, because the sgRNA was stably expressed, genomic cleavage continued to increase over the course of the experiment. Deep sequencing of the locus revealed that repair of Cas9-induced double-strand breaks resulted in small deletions (<20 bp) in the target sequence, with tiny insertions or substitutions (<3 bp) occurring at a lower frequency (Fig. 1D). The vast majority of the lesions, occurring in a protein-coding region, would be predicted to give rise to a nonfunctional protein product, indicating that CRISPR/Cas9 is an efficient means of generating loss-of-function alleles.
We also analyzed off-target activity of CRISPR/Cas9. Although the specificity of CRISPR/Cas9 has been extensively characterized in transfection-based settings (20–22), we wanted to examine its off-target behavior in our system, where Cas9 and a single guide RNA targeting AAVS1 (sgAAVS1) were stably expressed for two weeks. We compared the level of cleavage observed at the target locus (97%) to levels at 13 potential off-target cleavage sites in the genome (defined as sites differing by up to 3 bp from sgAAVS1). Minimal cleavage (<2.5%) was observed at all sites with one exception, which was the only site that had perfect complementarity in the ‘seed’ region (terminal 8 bp). On average, sgRNAs have ~2.2 such sites in the genome, almost always (as in this case) occurring in non-coding DNA and thus less likely to affect gene function (Note S1).
To test the ability to simultaneously screen tens of thousands of sgRNAs, we designed a sgRNA library with 73,151 members, consisting of multiple sgRNAs targeting 7,114 genes and 100 non-targeting controls (Methods, Fig. 1E, Table S1). sgRNAs were designed against constitutive coding exons near the beginning of each gene and filtered for potential off-target effects based on sequence similarity to the rest of the human genome (Fig. 1, F and G). The library included 10 sgRNAs for each of 7033 genes and all possible sgRNAs for each of the 84 genes encoding ribosomal proteins (Fig. 1H). To assess the effective representation of our microarray synthesized library, we sequenced sgRNA barcodes from KBM7 cells 24 hours after infection with the entire lentiviral pool and were able to detect the overwhelming majority (>99%) of our sgRNAs, with high uniformity across constructs (only 6-fold increase in abundance between the 10th and 90th percentiles) (Fig. S2A).
As an initial test of our approach, we screened the library for genes that function in DNA mismatch repair (MMR). In the presence of the nucleotide analog 6-thioguanine (6-TG), MMR-proficient cells are unable to repair 6-TG-induced lesions and arrest at the G2-M cell-cycle checkpoint, while MMR-defective cells do not recognize the lesions and continue to divide (23). We infected Cas9-KBM7 cells with the entire sgRNA library, cultured the cells in a concentration of 6-TG that is lethal to wild-type KBM7 cells, and sequenced the sgRNA barcodes in the final population. sgRNAs targeting the genes encoding the four components of the MMR pathway (MSH2, MSH6, MLH1 and PMS2) (24) were dramatically enriched in the 6-TG-treated cells. At least four independent sgRNAs for each gene showed very strong enrichment and barcodes corresponding to these genes made up >30% of all barcodes (Fig. 2, A and B). Strikingly, each of the twenty most abundant sgRNAs targeted one of these four genes. The fact that few of the other 73,000 sgRNAs scored highly in this assay suggests a low frequency of off-target effects.
We next addressed the challenge of loss of function screening in diploid cells, which require bi-allelic inactivation of a target gene. We therefore generated an inducible Cas9 derivative of the HL60 pseudo-diploid human leukemic cell line. In both HL60 and KBM7 cells, we screened for genes whose loss conferred resistance to etoposide, a chemotherapeutic agent that poisons DNA topoisomerase IIA (TOP2A). To identify hit genes, we calculated the difference in abundance between the treated and untreated populations for each sgRNA, calculated a score for each gene by using a Kolmogorov-Smirnov test to compare the sgRNAs targeting the gene against the non-targeting control sgRNAs, and corrected for multiple hypothesis testing (Fig. 2, C to E, Table S2). Identical genes were detected in both screens, with significance levels exceeding all other genes by more than 100-fold. As expected, loss of TOP2A itself conferred strong protection to etoposide (25). The screen also revealed a role for CDK6, a G1 cyclin-dependent kinase, in mediating etoposide-induced cytotoxicity. Notably, every one of the 20 sgRNAs in the library targeting TOP2A or CDK6 was strongly enriched (>90th percentile) in both screens, indicating that the effective coverage of our libraries is very high. We generated isogenic HL60 cell lines with individual sgRNAs against TOP2A and CDK6 and, consistent with the screen results, these lines were much more resistant to etoposide than parental or sgAAVS1-modified HL60 cells (Fig. 2, F and G). Thus, our Cas9/sgRNA system enables large-scale positive selection loss-of-function screens.
To identify genes required for cellular proliferation we screened for genes whose loss conferred a selective disadvantage on cells. Such a screen requires accurate identification of sgRNAs that are depleted from the final cell population. Importantly, a sgRNA will show depletion only if cleavage of the target gene occurs in the majority of cells carrying the construct.
As an initial test, we screened KBM7 cells with a small library containing sgRNAs targeting the BCR and ABL1 genes (Table S3). The survival of KBM7 cells depends on the fusion protein produced by the BCR-ABL translocation (26). As expected, depletion was seen only for sgRNAs targeting the exons of BCR and ABL1 that encode the fusion protein, but not for those targeting the other exons of BCR and ABL1 (Fig. 3A).
We then infected Cas9-HL60, Cas9-KBM7, and WT KBM7 cells with the entire 73,000-member sgRNA library and used deep sequencing of the sgRNA barcodes to monitor the change in abundance of each sgRNA between the initial seeding and a final population obtained after twelve cell doublings (Fig. S2, A and B).
We began by analyzing ribosomal proteins genes, for which the library contained all possible sgRNAs. We observed strong Cas9-dependent depletion of sgRNAs targeting genes encoding ribosomal proteins, with good concordance between the sets of ribosomal protein genes essential for cell proliferation in the HL60 and KBM7 screens (the median sgRNA fold-change in abundance was used as a measure of gene essentiality) (Fig. 3, B and C). Interestingly, a few ribosomal protein genes were not found to be essential. These were two genes encoded on chromosome Y (RPS4Y2, which is testes-specific (27), and RPS4Y1, which is expressed at low levels compared to its homolog RPS4X on chromosome X (28)), and ‘ribosome-like’ proteins, which may be required only in select tissues (27) and generally are lowly expressed in KBM7 cells (Fig. S3A).
We then turned our attention to other genes within our dataset, for which ten sgRNAs were designed. As for the ribosomal genes, the essentiality scores of these genes were also strongly correlated between the two cells lines (Fig. S3B, Table S4). For the twenty highest scoring genes, we found independent evidence for essentiality, based primarily on data from large-scale functional studies in model organisms (Table S5).
To evaluate the results at a global level, we tested 4722 gene sets to see if they showed strong signatures of essentiality, using Gene Set Enrichment Analysis (29). Gene sets related to fundamental biological processes – including DNA replication, gene transcription, and protein degradation – showed strong depletion, consistent with their essentiality (Fig. 3D, Table S6).
Finally, we sought to understand the features underlying sgRNA efficacy. Although the vast majority of sgRNAs against ribosomal protein genes showed depletion, detailed comparison of sgRNAs targeting the same gene revealed substantial variation in the precise amounts of depletion. These differences are unlikely to be caused by local accessibility to the Cas9/sgRNA complex inasmuch as comparable variability was observed even among sgRNAs targeting neighboring target sites of a given gene (Fig. S4A). Given that our library includes all possible sgRNAs against each of the 84 ribosomal genes, the data allowed us to search for factors that might explain the differential efficacy of sgRNAs. Because the majority of ribosomal proteins genes are essential, we reasoned that the level of depletion of a given ribosomal protein-targeting sgRNA could serve as a proxy for its cleavage efficiency. Applying this approach, we found several trends related to sgRNA efficacy: (1) Single guide sequences with very high or low GC content were less effective against their targets. (2) sgRNAs targeting the last coding exon were less effective than those targeting earlier exons, consistent with the notion that disruption of the terminal exon would be expected to have less impact on gene function. (3) sgRNAs targeting the transcribed strand were less effective than those targeting the non-transcribed strand (Fig. 3E). Although these trends were statistically significant, they explained only a small proportion of differences in sgRNA efficacy (Table S7).
We hypothesized that differences in sgRNA efficacy might also result from sequence features governing interactions with Cas9. To test this, we developed a method to profile the sgRNAs directly bound to Cas9 in a highly parallel manner (Methods). By comparing the abundance of sgRNAs bound to Cas9 relative to the abundance of their corresponding genomic integrants, we found that the nucleotide composition near the 3′-end of the spacer sequence was the most important determinant of Cas9 loading (Fig. 3F). Specifically, Cas9 preferentially bound sgRNAs containing purines in the last 4 nucleotides of the spacer sequence whereas pyrimidines were disfavored. A similar pattern emerged when we examined depletion of ribosomal protein-targeting sgRNAs (r=0.81), suggesting that, in significant part, the cleavage efficiency of a sgRNA was determined by its affinity for Cas9 (Table S7).
We then sought to build an algorithm to discriminate between strong and weak sgRNAs (Fig. 3G). We trained a support-vector-machine classifier based on the target sequences and depletion scores of ribosomal protein-targeting single guide RNAs. As an independent test, we used the classifier to predict the efficacy of sgRNAs targeting the 400 top scoring (i.e. essential) non-ribosomal genes. The top two-thirds of our predictions exhibited 3-fold higher efficacy than the remaining fraction, confirming the accuracy of the algorithm.
Using this algorithm, we designed a whole-genome sgRNA library consisting of sequences predicted to have higher efficacy (Table S8). As with the sgRNA pool used in our screens, this new collection was also filtered for potential off-target matches. This reference set of sgRNAs may be useful both for targeting single genes as well as large-scale sgRNA screening.
Taken together, these results demonstrate the utility of CRISPR/Cas9 for conducting large-scale genetic screens in mammalian cells. Based on our initial experiments, this system appears to offer several powerful features that together provide significant advantages over current functional screening methods.
First, CRISPR/Cas9 inactivates genes at the DNA level, making it possible to study phenotypes that require a complete loss of gene function to be elicited. In addition, the system should also enable functional interrogation of non-transcribed elements, which are inaccessible by RNAi.
Second, a large proportion of sgRNAs successfully generate mutations at their target sites. While this parameter is difficult to directly assess in pooled screens, we can obtain an estimate by examining the ‘hit rate’ at known genes. Applying a z-score analysis of our positive selection screens, we find that over 75% (46/60) of sgRNAs score at a significance threshold that perfectly separates true and false positives on a gene level (Fig. S5, A to D). Together these results show that the effective coverage of our library is very high and that the rate of false negatives should be low even in a large-scale screen.
Third, off-target effects do not appear to seriously hamper our screens, based on several lines of evidence. Direct sequencing of potential off-target loci detected minimal cleavage at secondary sites, which typically reside in non-coding regions and do not impact gene function. Moreover, in the 6-TG screens, the twenty most abundant sgRNAs all targeted one of the four members of the MMR pathway. In total, they represented over 30% of the final pool, a fraction greater the next 400 sgRNAs combined. In the etoposide screen, the two top genes scored far above background levels (p-values 100-fold smaller than the next best gene), enabling clear discrimination between true and false positive hits. Lastly, new versions of the CRISPR/Cas9 system have recently been developed that substantially decrease off-target activity (30, 31).
Although we limited our investigation to proliferation-based phenotypes, our approach can be applied to a much wider range of biological phenomena. With appropriate sgRNA libraries, the method should enable genetic analyses of mammalian cells to be conducted with a degree of rigor and completeness currently possible only in the study of microorganisms.
Supplementary Material
SOM
Supp Table 1
Supp Table 2
Supp Table 3
Supp Table 4
Supp Table 8
Acknowledgments
We thank all members of the Sabatini and Lander labs especially J. Engreitz, S. Schwartz, A. Shishkin, and Z. Tsun for protocols, reagents and advice; T. Mikkelsen for assistance with oligonucleotide synthesis; L. Gaffney for assistance with figures. This work was supported by the US National Institutes of Health (CA103866) (D.M.S.), National Human Genome Research Institute (2U54HG003067-10) (E.S.L.), the Broad Institute of MIT and Harvard (E.S.L.) and an award from the US National Science Foundation (T.W.). The composition of the sgRNA pools and screening data can be found in the Supporting Online Materials. T.W., D.M.S., and E.S.L. are inventors on a patent application from the Broad Institute for functional genomics using CRISPR-Cas systems. Inducible Cas9 and sgRNA backbone lentiviral vectors and the genome-scale sgRNA plasmid pool are deposited in Addgene.
Footnotes
Materials and Methods
References (33–43)
References and Notes
Full text links
Read article at publisher's site: https://doi.org/10.1126/science.1246981
Read article for free, from open access legal sources, via Unpaywall: https://www.science.org/cms/asset/4b44c052-33c0-4022-acc0-2a52f5b82c0c/pap.pdf
Citations & impact
Impact metrics
Article citations
The Evolution of Nucleic Acid-Based Diagnosis Methods from the (pre-)CRISPR to CRISPR era and the Associated Machine/Deep Learning Approaches in Relevant RNA Design.
Methods Mol Biol, 2847:241-300, 01 Jan 2025
Cited by: 0 articles | PMID: 39312149
Genotype from Phenotype: Using CRISPR Screens to Dissect Lymphoma Biology.
Methods Mol Biol, 2865:241-257, 01 Jan 2025
Cited by: 0 articles | PMID: 39424727
The present and future of the Cancer Dependency Map.
Nat Rev Cancer, 28 Oct 2024
Cited by: 0 articles | PMID: 39468210
Review
Genome-wide CRISPR screens identify CLC-2 as a drug target for anti-herpesvirus therapy: tackling herpesvirus drug resistance.
Sci China Life Sci, 12 Oct 2024
Cited by: 0 articles | PMID: 39428427
Tissue-specific knockout in the Drosophila neuromuscular system reveals ESCRT's role in formation of synapse-derived extracellular vesicles.
PLoS Genet, 20(10):e1011438, 10 Oct 2024
Cited by: 0 articles | PMID: 39388480 | PMCID: PMC11495600
Go to all (1,629) article citations
Other citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
CRISPR/Cas9-mediated gene knockout screens and target identification via whole-genome sequencing uncover host genes required for picornavirus infection.
J Biol Chem, 292(25):10664-10671, 26 Apr 2017
Cited by: 22 articles | PMID: 28446605 | PMCID: PMC5481571
A novel sgRNA selection system for CRISPR-Cas9 in mammalian cells.
Biochem Biophys Res Commun, 471(4):528-532, 12 Feb 2016
Cited by: 8 articles | PMID: 26879140
High-throughput screens in mammalian cells using the CRISPR-Cas9 system.
FEBS J, 282(11):2089-2096, 16 Mar 2015
Cited by: 37 articles | PMID: 25731961
Review
Genome-scale CRISPR-Cas9 knockout screening in human cells.
Science, 343(6166):84-87, 12 Dec 2013
Cited by: 3029 articles | PMID: 24336571 | PMCID: PMC4089965
Funding
Funders who supported this work.
NCI NIH HHS (2)
Grant ID: CA103866
Grant ID: R01 CA103866
NHGRI NIH HHS (2)
Grant ID: 2U54HG003067-10
Grant ID: U54 HG003067