Abstract
Free full text
Functional Annotation of Chemical Libraries across Diverse Biological Processes
Abstract
Chemical-genetic approaches offer the potential for unbiased functional annotation of chemical libraries. Mutations can alter the response of cells to a compound, revealing chemical-genetic interactions that can elucidate a compound’s mode of action. We developed a highly parallel and unbiased yeast chemical-genetic screening system involving three key components. First, in a drug-sensitive genetic background, we constructed an optimized, diagnostic mutant collection that is predictive for all major yeast biological processes. Second, we implemented a multiplexed (768-plex) barcode sequencing protocol, enabling assembly of thousands of chemical-genetic profiles. Finally, based on comparison of the chemical-genetic profiles with a compendium of genome-wide genetic interaction profiles, we predicted compound functionality. Applying this high-throughput approach, we screened 7 different compound libraries and annotated their functional diversity. We further validated biological process predictions, prioritized a diverse set of compounds, and identified compounds that appear to have dual modes of action.
Graphical abstract
INTRODUCTION
Discovery and development of novel compound libraries has outpaced the functional characterization of these compounds, leading to a growing knowledge gap1. Chemical probes that target specific cellular functions are valuable entities because they can provide insight into fundamental cellular functions and represent putative leads for new drug development. Despite a massive wealth of whole-genome sequence data that has identified hundreds of potential new druggable targets, in both humans and pathogens, we lack the chemical probes to take advantage of these insights2. Therefore, a major demand exists for large-scale functional annotation of bioactive compounds.
Whole-cell screening approaches are advantageous because they identify bioavailable molecules and provide readouts based on general bioactivity3, a particular phenotypic response4, or a specific reporter system5 while maintaining biological context. Chemical-genetics expands traditional whole cell screening because it has the potential to monitor all cellular pathways in an unbiased manner6. A typical chemical-genetic screen involves testing a collection of mutant strains with defined genetic perturbations for fitness defects or advantages when grown in the presence of a specific compound6–8. Quantifying the relative fitness of a collection of mutant strains in response to compound treatment generates a chemical-genetic interaction profile, which provides diagnostic functional information about a compound’s general mode-of-action7,9.
Saccharomyces cerevisiae represents a powerful eukaryotic model system for chemical-genetic analysis, due to its facile genetics and availability of functional genomic reagents and tools. For example, genome-wide gene deletion analysis10 identified ~1000 essential genes and enabled the generation of a set of ~5000 viable haploid deletion mutants. The essential genes can be exploited for chemical genetic studies as heterozygous diploid mutants, whereas the nonessential genes can be studied as viable haploid deletion mutants, such that each mutant is examined for hypersensitivity or resistance to a compound7,8. Each strain is uniquely barcoded allowing the responses of hundreds of pooled mutants to be measured simultaneously to generate a chemical-genetic interaction profile6,7.
A comprehensive genetic interaction network, in which the majority of all possible double mutants are scored for genetic interactions quantitatively, has been mapped for yeast11. A genetic interaction occurs when mutations in two or more genes combine to generate an unexpected phenotype. Given the single mutant phenotypes, a negative genetic interaction occurs when two mutations combine to produce a double mutant fitness defect that is more severe than expected, whereas a positive genetic interaction reflects a double mutant fitness defect that is less severe than expected. The set of negative and positive genetic interactions for a particular query gene represents a genetic interaction profile, which provides a quantitative description of gene function. A global network of genetic interaction profile similarities groups genes with similar roles into dense gene clusters that represent major biological processes and thus highlights the functional organization of a cell11,12. Importantly, a global compendium of genetic interaction profiles can be used to functionally interpret chemical-genetic interaction profiles9,12. If a bioactive compound inhibits a specific target protein, then loss-of-function mutations in the corresponding target gene should mimic the bioactivity of the compound9,12. Moreover, the genetic interaction profile of the target gene should resemble the chemical-genetic interaction profile of the inhibitory compound that modulates the target pathway9,12. For example, the genetic interaction profile associated with a partial loss-of-function mutation in ERG11, which encodes the target of fluconazole, closely resembles the chemical-genetic interaction profile of fluconazole9. Thus, the global genetic interaction network provides a general key for interpreting the target pathways of bioactive compounds, enabling compounds to be annotated to specific biological processes and possibly specific pathways.
We developed a high-throughput chemical-genetic screening platform to functionally annotate large compound collections in a rapid and systematic manner. To do so, we constructed a diagnostic set of viable yeast gene deletion mutants, each carrying a unique DNA barcode identifier, which span all major biological processes, within a drug-sensitized, genetic background. We also developed a highly multiplexed (768-plex) barcode sequencing protocol, allowing us to generate rich chemical-genetic profiles for hundreds of compounds simultaneously. Finally, we assembled a computational platform for functionally annotating compounds to specific biological processes and pathways. Ultimately, we applied this chemical-genetic pipeline to annotate seven diverse libraries containing 13,524 compounds in an unbiased and systematic manner.
RESULTS
To design a pipeline for high-throughput chemical-genetic profiling and functional annotation of chemical libraries (Fig. 1a), we first selected an optimal set of diagnostic genes and constructed a mutant strain collection in which each diagnostic gene was individually deleted in a drug-hypersensitive genetic background. Second, we developed a highly multiplexed barcode sequencing13 system for chemical-genetic profiling with optimized signal detection. Third, we implemented computational approaches to integrate chemical-genetic profiles with the global yeast genetic interaction network to predict biological processes targeted by specific compounds. Finally, we assembled a database of chemical structures, chemical-genetic profiles, and functional predictions for each library investigated in this study.
Developing a diagnostic gene set for chemical-genetic profiling
To increase the potential for detecting bioactive compounds, we constructed a drug-sensitized yeast genetic background by combining deletions of PDR1 and PDR3, both of which encode transcription factors known to regulate the yeast pleiotropic drug response14, with a deletion of SNQ2, which encodes a multidrug transporter (Supplementary Results, Supplementary Fig. 1). We tested growth of the resultant pdr1 pdr3 snq2 (3) drug-sensitized strain in the presence of 440 different control compounds (see Methods, Supplementary Dataset 1) and observed a ~5-fold increase in the number of compounds that inhibited growth of the drug-sensitized strain compared to wild-type cells via a halo assay, indicating that these deletion mutations sensitized yeast to diverse classes of compounds (Fig. 1b)15. When considering the complete set of 13,524 compounds tested in this study, the average “hit rate”, corresponding to the fraction of bioactive compounds within a collection that causes at least 20% growth inhibition in the drug-sensitized strain in liquid medium, was ~35% across all compounds tested, which is ~5X greater than the hit rate found using the equivalent wild-type strain background in previous studies8 (Supplementary Dataset 1). Specific chemical-genetic interactions were also detected more readily in the drug-sensitized background. For example, at a concentration of 34.4 µM, the microtubule-binding compound benomyl showed a specific chemical-genetic interaction with TUB3, which encodes α-tubulin, only in our drug-sensitized background (Fig. 1c). Similarly, we analyzed the response to a cell wall glucan synthase inhibitor, micafungin, at 25 nM, and we detected a specific chemical-genetic interaction with BCK1, which encodes a component of the PKC cell wall integrity-signaling pathway (Fig. 1d). In both cases, only the known sensitive mutant showed an exaggerated chemical-genetic interaction, suggesting that, like wild-type cells, the drug-sensitive background identifies functionally relevant signals (Fig. 1c-d).
Because genes within the same pathway and the same biological process tend to share similar genetic interaction profiles9,12, only a subset of genes are required to capture functionally informative genetic interaction signatures for a given gene. Leveraging this property, we developed a computational approach for optimal selection of mutants for chemical-genetic screens, identifying a set of 157 functionally diagnostic strains (Fig. 1e) (see Methods). Independently, we also manually selected 236 strains mutated for genes that span major yeast biological processes that belong to highly-connected clusters in the global genetic interaction profile similarity network12, 83 of which overlapped with the computationally selected set. Thus, the final diagnostic pool consisted of 310 deletion mutant strains (~6% of all nonessential genes) that spans a similar functional space as the entire non-essential deletion mutant collection (Supplementary Fig. 2, Supplementary Dataset 2). While members of our diagnostic subset are not distributed proportionally across the 17 major bioprocesses, these were selected not only for bioprocess representation, but also their predictive power (see Methods). Even though we are using a subset of strains, this diagnostic collection has been optimized for gene similarity-based target prediction across the entire set of genetic interaction query strains (Fig. 1e).
Furthermore, we compared the individual fitness of each sensitized deletion strain to the original deletion collection (Supplementary Dataset 2), and used this fitness score to select pool members with near-equivalent fitness. We observed that ~20% of the mutants in diagnostic pool version 2.0 could not be scored by our standard SGA scoring method because of irregularities in colony shape in the pdr1 pdr3 snq2 genetic background (Supplementary Dataset 2), and we verified that these mutants had appropriate fitness values based on barcode representation after pooled liquid growth. Reducing the complexity of the ~5000 viable yeast deletion mutant collection to a smaller diagnostic set allowed us to maximize the dynamic range for detecting chemical-genetic interactions in a micro-culture and increased the degree of multiplexing for our barcode sequencing read-out.
Optimizing signal detection and high-throughput screening
Detecting drug-gene interactions requires a clear separation of sensitive/resistant mutants relative to the unaffected mutants in the pooled assay. To optimize signal detection, we tested the effects of three factors on detection of drug-gene interactions using the well-characterized compounds benomyl and micafungin. These included inoculum size, incubation time, and the number of PCR cycles used for barcode DNA amplification (see Methods). Incubation time had the most pronounced effect on the signal to noise ratio of the chemical-genetic profiles, with the optimal outcome observed after 48 h incubation (Fig. 1f, Supplementary Fig. 3a). For example, gene deletion mutants defective in microtubule functions, including CIN1, CIN4, GIM3, and TUB3, were depleted efficiently from the culture after 48 h growth in the presence of benomyl. The assay was relatively robust to inoculum density and number of PCR amplification cycles (Supplementary Fig. 3a). Ultimately, the screening conditions we selected included 200 µL micro-cultures, 48 h growth, at an inoculum of 250 cells/strain and 30 PCR cycles for barcode amplification. These parameters resulted in high correlation between biological replicates (Supplementary Fig. 3b).
Multiplexing of chemical-genetic samples is critical for screening large chemical libraries composed of thousands of compounds. Employing a custom-designed set of 768 multiplex primers, each containing a unique 10bp multiplex tag (Supplementary Dataset 3, see Methods), we found that combining the barcode DNA samples from 768 different chemical-genetic experiments produced profiles of similar quality to profiles for the same set of compounds generated at 96-plex (Fig. 1g). Thus, we adopted a screening strategy of 768 samples per Illumina HiSeq sequencing lane, or 6144 samples per flow cell. Under these conditions, biological replicates (independently grown cultures of the same strain pool) from different sequencing lanes exhibited highly reproducible chemical-genetic profiles (Fig. 1h). In pilot experiments, we sequenced barcodes using two separate reads, one for the multiplex tag and another for the deletion barcode; this methodology was thought to improve sequencing accuracy because it reduces the read length16. However, we achieved a more uniform distribution of sequence counts across conditions and barcoded mutants by using a single sequencing reaction designed to read through the entire PCR amplicon (Supplementary Fig. 3c).
Chemical-genetic profiling of diverse compound libraries
Applying our optimized pipeline, we generated chemical-genetic interaction profiles for 13,524 compounds by screening seven diverse compound collections: the RIKEN Natural Product Depository (NPDepo), which is composed largely of purified natural products or natural product derivatives, four collections from the National Cancer Institute’s Open Chemical Repository (natural products: NCI-NP, approved oncology drugs: NCI-ONC, structural and mechanistic diversity sets: NCI-STRUCT-DIV and NCI-MECH-DIV, respectively), a library of compounds from the National Institutes of Health Small Molecule Repository with a history of use in human clinical trials (NIH Clinical Collection or NIHCC), and the Glaxo-Smith-Kline kinase inhibitor collection (GSK-KI). A complete description of these collections, all compounds screened, their structures, basic physical properties, and chemical-genetic data is provided (Supplementary Datasets 4, 5, 6).
Chemical-genetic interactions were identified and scored by comparing the individual mutant barcode read counts to those from a set of solvent control conditions. A negative chemical genetic (CG) interaction score represents hypersensitivity to a compound whereas a positive CG score represents resistance (see Methods). At a relatively strict CG score threshold of ± –2.5 (z-score for enrichment/depletion in the presence of the compound relative to DMSO control), we observed positive chemical-genetic interactions between 0.5% of all compound-deletion mutant pairs, and negative chemical-genetic interactions between 1.1% of all compound-deletion mutant pairs. The set of highly bioactive compounds, which inhibited growth of the pooled collection by more than 20% (~4700 compounds), exhibited a substantially higher frequency of chemical-genetic interactions, with positive and negative interactions occurring between 1.3% and 2.3% of compound-mutant pairs respectively. Each deletion mutant displayed, on average, ~46 positive interactions and ~79 negative interactions across the entire collection of screened compounds. The number of chemical-genetic interactions for each strain (CG score ≥ 2.5 or ≤ –2.5) across all screened compounds is presented in Supplementary Dataset 7. Importantly, compounds screened both in our study, using the diagnostic set and in previous studies using the entire nonessential deletion mutant collection showed positive correlations, despite differences in strain backgrounds and methods used to measure mutant strain abundance (microarray vs. sequencing) (Supplementary Table 1).8,17
Hierarchical clustering analysis7,9 provides a visual representation of the diversity of the resultant chemical-genetic profiles. We focused on the most responsive subset of 173 gene deletion mutants, whose chemical-genetic profiles consisted of at least three extreme negative interactions (CG score ≤ –5), and 1380 compounds, which were derived from all seven collections (Fig. 2, See methods). The clustered matrix highlighted chemical-genetic interactions involving sets of functionally related genes participating in different biological processes, including DNA replication & repair (i), mitosis and chromosome segregation (ii), glycosylation, protein folding/targeting, cell wall biogenesis (iii), transcription and chromatin organization (iv), vesicle traffic (v), cell polarity and morphogenesis (vi), and other biological functions (Fig. 2). For example, a cluster of compounds, including benomyl and the tubulin-binding compound nocodazole, showed specific chemical-genetic interactions with TUB3 and CIN1, suggesting these compounds may target microtubule function or, more generally, target pathways with roles in mitosis and chromosome segregation. Indeed, this cluster includes a previously uncharacterized compound from the RIKEN NPDepo collection, NPD2784, which we found strongly inhibits polymerization of mammalian tubulin in vitro (Supplementary Fig. 4).
Integrating genetic and chemical-genetic profiles
The chemical-genetic interaction profile of a compound that targets a specific biological process should overlap the genetic interaction profiles of genes that function as part of that biological process9,12. To identify biological processes targeted by compounds, we compared the chemical-genetic profile of each compound to our comprehensive set of genetic interaction profiles (Supplementary Fig. 5), allowing us to score each compound-gene pair for profile similarity (see Methods). This analysis generated a set of gene-level similarity scores identifying a set of potential target genes for each compound. Although prediction of the precise gene target requires deeper experimental analysis, our approach readily predicted the biological process targeted by a particular compound based on Gene Ontology (GO) annotations shared among the target gene set (see Methods). To focus on high-confidence predictions, we estimated false discovery rates (FDR) for biological process-level predictions based on both resampled and DMSO control profiles and applied specific FDR thresholds (RIKEN NPDepo screen: FDR ≤ 25%; NCI/NIH/GSK screen: FDR ≤ 27%, see Methods). This analysis yielded 1522 high-confidence compound profiles that we refer to as our high confidence set (HCS) (Supplementary Dataset 8). We found that strains with many chemical-genetic interactions were important for bioprocess-level predictions (Supplementary Dataset 9). Interestingly, and in accordance with recent findings regarding the differences in functional information encoded by negative vs. positive genetic interactions11, we found that negative chemical-genetic interactions were the primary driver of genetic interaction-based target predictions, and without them, the quality of the predictions was reduced substantially (See Methods, Supplementary Dataset 10).
In general, we found that compound bioactivity was correlated with our ability to make high-confidence predictions, as ~82% of compounds in our high confidence set inhibited growth >20% (Supplementary Fig. 6). However, the remaining ~18% of HCS compounds were associated with a more modest bioactivity (<20% growth inhibition), suggesting that even weakly bioactive compounds can yield functionally informative chemical-genetic profiles and that pre-screening for bioactivity may exclude some predictive profiles. A set of 296 compounds displayed extremely high bioactivity, with >90% growth inhibition, and nearly 60% (122) of these compounds were excluded from the final dataset because their interaction profiles did not meet thresholds for strain representation. Interestingly, chemical-genetic profiles for these highly bioactive compounds showed that mutants defective for two genes involved in amino acid transport, GTR1 and AVT5, were highly resistant and accounted for a majority of the read counts from these compound conditions. This suggests that these genes may play general roles in small molecule transport and that their deletions may confer general resistance to highly bioactive compounds (Supplementary Fig. 7), and we confirmed this finding for gtr1 cells in an independent experiment involving 23 different compounds (Supplementary Dataset 11).
Defining the functional landscape of compound collections
To view the functional diversity of entire compound collections, each HCS compound was mapped onto the global genetic interaction profile similarity network at the location of the gene with the most similar genetic interaction profile to the compound’s top predicted biological process target12,18. The global network of genetic interaction profile similarities consists of 17 densely connected gene clusters, each representing a distinct biological process11 (Fig. 3a). The integration of the set of chemical-genetic profiles from a particular compound collection into the global genetic interaction profile similarity network allowed visualization of functional space covered by the compound collection (Fig. 3b) and enabled quantification of the diversity of targeted biological processes (Fig. 4a).
Every major functional cluster in the genetic network appeared to be targeted by at least one compound screened in this study (Fig. 3b). However, glycosylation, mitosis, cell polarity, and vesicle traffic related functions were the most frequently targeted, suggesting that these bioprocesses are more susceptible to chemical perturbation in yeast (Fig. 4a). When corrected for the number of compounds, the RIKEN NPDepo collection was the most functionally diverse collection, whereas the NCI natural products collection (Supplementary Fig. 8) was the least diverse. The NPDepo library can be partitioned chemically and mechanistically into different subset collections, including natural products (NP), natural product derivatives (NPD), and anti-cancer compounds (a manually curated list of RIKEN compounds with known anticancer activity, Supplementary Dataset 4), all of which showed distinct functional signatures in terms of their targeted bioprocess predictions. Each compound collection targets a unique set of biological processes (Fig. 3b and and4b),4b), suggesting that this global view of collection functionality can aid prioritization of screening efforts based on specific bioprocess targets of interest.
For the larger collections, we observed compounds targeting all 17 biological processes represented in the global genetic interaction similarity map. For example, the RIKEN NPDepo library was large and diverse enough to target all the major biological processes (Fig. 4a-b). Interestingly, the rate at which compounds targeted different biological processes differed from the distribution of genes across bioprocesses, suggesting a biased chemical target space (Fig. 4b i-iv). While each chemical library displayed a unique set of predicted bioprocess targets, common signatures emerged across several of the collections. For example, we observed a ~4-fold enrichment of compounds targeting glycosylation & protein folding related processes for most compound collections, including the NPDepo and NCI mechanistic diversity collections, the latter of which was designed to be relatively unbiased in terms of structure and functional annotations (Fig. 4b). Conversely, we saw a common depletion for compounds targeting DNA replication & repair and chromatin/transcription related processes, suggesting these processes are perturbed by compounds less frequently than expected, which could be an important consideration if, for example, targeting these biological processes for cancer therapeutics is a major goal.
While enrichment for cytosolic and cell surface targets and depletion for nuclear targets appeared as a general trend across several compound collections, exceptions were observed within specific libraries. In particular, for the NCI oncology collection, which is made up of anti-cancer agents largely directed towards the inhibition of cell division cycle functions and DNA replication/repair, we observed a strong enrichment for compounds targeting DNA replication & repair and transcription & chromatin organization relative to the expected background (Fig. 4b, p<0.001). The NCI oncology collection, along with the anti-cancer subset of the RIKEN collection, differed the most from the general trend observed for larger, less biased collections, reflecting the fact that these compounds have been selected for very specific purposes, which is largely confined to inhibiting growth of replicating cells.
The NIH-CC had a unique enrichment for compounds targeting metabolism and fatty acid biosynthesis, driven by GO predictions for sterol metabolic processes (Fig 4b, Supplementary Dataset 12). The majority of the compounds supporting this interact with cytochrome P450 enzymes19–23. In humans, compounds that inhibit or interact with cytochrome P450 have a high degree of drug-drug interactions 24. In yeast, cytochrome P450 homologs are ergosterol biosynthesis genes (ERG11, ERG5, NCP1). Thus, the yeast system provides a means of predicting compounds that interact with human cytochrome P450 enzymes, which could indicate compounds with a high degree of drug interactions.
The GSK kinase inhibitor (KI) library contains a characterized set of inhibitors of human kinases25. Three compounds from this collection were previously identified to bind human mitogen and stress activated kinases (MSK)26, and in yeast these had significant (p<0.05) enrichment predictions to the GO process of intracellular protein kinase cascade. This signaling pathway in yeast is mediated by the yeast mitogen activated protein kinase encoded by SLT2, the top single-gene target prediction for all 3 compounds (Supplementary Dataset 13), and has high homology to human ERK1, 2, and 4. Further, 5 compounds known to target human Polo-like kinase (PLK), were predicted in yeast to target the GO process nuclear import (Supplementary Table 13). The yeast homolog of PLK is CDC5, which is involved in regulating nuclear shape. These examples again suggest our yeast assay could be used to predict potential chemical bioprobes in human cells.
In general, the chemical-genetic functional signatures we observed appeared to be related to cellular localization because cytoplasmic or cell surface related bioprocesses were more readily perturbed and thus enriched across diverse chemical libraries (p<0.0001), whereas nuclear processes were less susceptible to chemical perturbation, and compounds predicted to target these processes were depleted among many of the libraries tested (p<0.0001) (Supplementary Dataset 14). This may suggest that, in general, bioactive compounds are less likely to reach the nucleus, while cell surface and cytosolic targets may be more druggable. This is consistent with a previous study27, which reports that out of 1362 annotated drug targets with orthologs across 4 mammalian species, only 8.4% of these targets localized to the nucleus whereas 56% of targeted proteins localized to either membranes or the cytosol.
Integrating structural and functional data
Because the RIKEN NPDepo contains sets of compound derivatives based upon variations of core scaffolds, we tested if compounds predicted to target similar functions were enriched for specific structural classes (Fig. 4b v-vi). Indeed, we found several instances where a large class of structural derivatives had similar predicted modes of action (Supplementary Dataset 15). For example, chemical-genetic profile similarity grouped a coherent set of artemisinin derivatives (Fig. 4b v) together within a broader subset of 358 compounds annotated to the “mitosis and chromosome segregation” biological process. While artemisinin is an effective anti-malarial drug, the cellular target(s) of this compound remain unclear28. In yeast, artemisinin is known to affect the cell cycle as well as mitochondrial function29,30. Furthermore, artemisinin has well-established effects on cancer cell cycle progression31–33. Our functional annotation supports both of these diverse roles for artemisinin because our artemisinin-related natural product (NP266) was annotated with 2 different biological process predictions: mitochondria cristae formation and microtubule cytoskeleton organization (Supplementary Dataset 8); however, the artemisinin derivatives that contain a relatively long side chain, extending from the three-ring core, have stronger predictions to a mitosis-related rather than a mitochondrial bioprocess-level target.
In another example, the furanocoumarin tricycle (psoralen) structural class is represented by multiple derivatives within the NPDepo library (Fig. 4b vi). Psoralen and its derivatives have been used to treat cutaneous T-cell carcinoma and dermatological conditions such as psoriasis and eczema34. The RIKEN NPDepo psoralen derivatives were frequently predicted to affect vesicle trafficking and membrane associated processes, and it is possible that other RIKEN NPDepo compounds with overlapping functional annotation could have a similar therapeutic potential.
Targeted process validations and assessing predictive power
In a previous study, the DNA content of yeast mutant strains harboring conditional alleles of essential genes were analyzed by flow cytometry, showing how each essential gene affects cell cycle progression and mapping specific cell cycle progression defects to different biological process35. For example, inhibiting the function of essential genes involved in translation causes an accumulation of cells in G1 phase (“G1” phenotype), reflecting insufficient protein synthetic capacity to transit the restriction point in G1 (referred to as Start in yeast), whereas inhibiting genes involved in DNA synthesis causes an accumulation of cells in S phase (“S” phenotype), and inhibiting mitosis genes results a G2 phase accumulation (“G2” phenotype). We performed high-throughput flow cytometry analysis on cell populations exposed to a set 67 different HCS compounds from the RIKEN NPDepo (Supplementary Fig. 9) that were predicted to cause specific cell cycle arrest phenotypes (Fig. 5a-b). In total, 27/67 (40%) of these compounds resulted in a cell cycle perturbation, and overall, 19/27 (70%) of compounds affecting cell cycle progression induced a phenotype consistent with our chemical-genetic predictions (Supplementary Dataset 16). For example, NPE94 was predicted to affect regulation of mitosis, and indeed, cells treated with this compound accumulated in G2 phase (Fig. 5a). Compounds displaying a cell cycle phenotype showed significant enrichment for each of the compounds’ predicted phenotypes over a background with permuted compound labels (G1: ~12-fold enrichment over background, p<0.001; G2: ~3-fold enrichment, p<0.01; S: 4-fold enrichment, p<0.001) (Fig. 5a-b).
As a second validation, we examined the activity of 25 compounds annotated to cell wall-related biological processes, utilizing several different cell biological readouts. To serve as controls, we selected 24 high-confidence compounds with equivalent growth inhibition and diverse bioprocess-level predictions but excluding “Cell Polarity and Morphogenesis” or “Glycosylation, Protein folding and Cell Wall Biosynthesis” bioprocesses predictions (Supplementary Dataset 17). Microscopic examination of fluorescent staining of two different cell wall polymers, β-1,3-glucan and chitin, revealed that 8/25 (32%) cell wall predicted compounds induced abnormal cell wall composition (Fig. 5c-d), and 10/25 (40%) caused increased bud neck width (Fig. 5e), a common phenotype of cell-wall-targeting agents36,37. Furthermore, 7 of these compounds caused hypersensitivity to zymolyase (Supplementary Fig. 10a), which degrades yeast cell wall β-1,3-glucan. In addition, 3/25 compounds caused rapid cell leakage similar to echinocandin B (Supplementary Fig. 10b), an antifungal drug that inhibits β-1,3-glucan biosynthesis. Among these compounds, we found a set compounds structurally similar to pseudojervines (Supplementary Fig. 10c). Based on this, we predicted, and confirmed that the poorly characterized parent compound jervine caused similar, abnormal glucan localization (Supplementary Fig. 10d). The proportion of compounds that showed cell wall phenotypes in the cell wall-predicted set of compounds was significantly greater than that in the control compounds, even when all pseudojervines were treated as one compound. Overall, 48% (12/25) of the compounds predicted to target cell wall biosynthesis exhibited at least one cell wall defect associated phenotype, and 36% (9/25) of the compounds exhibited at least two phenotypes (Supplementary Dataset 16). In contrast, only 4% (1/24) of the control compounds showed any cell wall phenotypes (p < 0.05), (Supplementary Dataset 17).
Predicting compounds with dual targets
Our database of biological-process level annotation also offers the potential to screen for compounds that have multiple targets. Many pharmaceuticals perturb multiple cellular functions38, and identifying multifunctional compounds provides opportunities for drug repurposing and addressing potential side-effects of clinical agents39. We mined our HCS set of predictions to identify compounds that were associated with two, distinct biological processes (Fig 6a, Supplementary Table 2). One of the top ranked compounds predicted to have multiple targets was NP214, a bleomycin A2 derivative. NP214 was predicted to target two different processes: (1) DNA replication (p<0.001) and (2) cellular proton transport (p<0.0001). The primary target of bleomycin and related compounds is DNA40; however, there is also evidence suggesting that these compounds perturb cellular membranes41–43, a secondary mode of action that could underlie bleomycin-induced side effect of lung fibrosis41. In mammalian cells, apart from its DNA activity, bleomycin has been shown to affect membrane redox potential and proton movement44. Moreover, bleomycin-iron complexes generate singlet oxygen and cause lipid peroxidation45,46. Thus, our chemical-genetic biological process predictions captured both the primary role of bleomycin (DNA damage) and secondary mechanisms that are consistent with known bleomycin side-effects.
From a ranked list of dual target predictions (Supplementary Table 2) for HCS compounds, we observed a common coupling of DNA related processes and cell wall biogenesis (Fig. 6a). For example, when we exposed yeast cells to NPD5925, a novel RIKEN NPDepo compound that was predicted to affect both DNA catabolism (p<0.001) and cell wall biogenesis (p<0.001), they displayed cell surface defects, such as zymolyase sensitivity (Supplementary Fig. 10), and a cell leakage phenotype resembling that of echinocandin B (Fig. 6b). Because NPD5925 is fluorescent, we imaged its staining pattern and found that it localized to the nucleus, similar to DAPI (4',6-diamidino-2-phenylindole) (Fig. 6c, Supplementary Fig. 11); it also induced a G1/early S phase cell cycle arrest, similar to the arrest observed with high levels of hydroxyurea (Fig. 6d).47 While a compound that targets a pleiotropic gene could appear to perturb multiple, unique processes, we scanned the global yeast genetic interaction network for examples of genes displaying this type of genetic interaction profile; however, we were unable to find a single gene that could explain the dual bioprocess-level predictions of NPD5925 (Supplementary Fig. 12), providing further evidence that it perturbs both DNA catabolism and cell wall biogenesis processes independently.
Despite the clear dual target signal of these compounds, the effect of dose likely plays a significant role in separating the multiple modes of action of a compound. Indeed, a dose curve would likely help further dissect primary from secondary mechanisms of action of compounds. For example, in the case of NP214 and NPD5925, it is possible that DNA may be the primary target, and thus the DNA binding CG score signal would likely be apparent at lower doses, whereas the cellular proton transport or cell wall signals may only be detectable at higher doses. However, as we have screened dozens of DNA damaging agents that have not yielded these specific dual target signals, it is not likely that these findings are a consequence of general effects on DNA. While we still do not know the exact mechanism of NPD5925, we are able to deconstruct complex phenotypic consequences of a compound.
New chemical genomic resources and analytical tools
We generated an active database named MOSIAC (http://mosaic.cs.umn.edu/), housing all our chemical-genetic screens. The MOSAIC database catalogs the structural and basic physical properties of all compounds tested, including their bioactivity, chemical-genetic profiles, as well as the biological process and gene-level target predictions. We also developed novel software tools, called BEAN-counter (Barcoded Experiment Analysis from Next-generation sequencing), for processing raw sequencing data into chemical-genetic interaction profiles, and CG-TARGET (Chemical-Genetic Translation via A Reference Genetic interaction nETwork) for predicting biological process-level targets from chemical-genetic interaction profiles. These new software tools are available at http://github.com/csbio/.
The Bioprocess Diversity Set
We distilled the most functionally diverse compounds from all 7 libraries analyzed in this study into a new “Bioprocess Diversity Set” (Supplementary Dataset 18), which represents a selected collection of our HCS bioactive compounds whose targets span the functional landscape of the cell. We also selected “Bioprocess Specific Sets”, each consisting of a set of compounds predicted to target one of the 17 different biological processes represented in the global genetic interaction profile similarity network (Supplementary Dataset 19). We anticipate that these new compound collections should provide a powerful new resource for modulating cellular physiology through diverse perturbations and streamlining the chemical-genetic discovery pipeline, enabling a focused analysis on specific biological processes of interest. The Bioprocess Diversity Set and the Bioprocess Specific Sets can easily be sorted to focus on individual compound libraries, including the NCI, NIH, GSK and RIKEN NPDepo libraries.
DISCUSSION
Our high-throughput chemical-genetics platform addresses a need for an unbiased, whole cell method that provides rapid, functional annotation of compound libraries. We used this system to screen 13,524 compounds across 7 different libraries, yielding rich chemical-genetic profiles and high-confidence functional predictions for a set of 1522 compounds. We cataloged the complete dataset as an open chemical-genetics resource (http://mosaic.cs.umn.edu/).
Our functional annotation of chemical libraries offers a strategy for prioritizing compounds that display bioactivity directed towards particular biological processes. The scale of functional annotation also provides a global view of the chemical activity within a library, which should allow testing of general hypotheses relevant to chemical biology. Importantly, the high-throughput nature of this assay provides opportunities for systematic, large-scale functional analysis of natural extract collections. Natural extract collections are often far more expansive than pure compound libraries and may contain broader mechanistic diversity. Functional annotation of these collections would help identify and prioritize promising extracts for detailed fractionation7.
Our approach highlights the use of drug hypersensitive and diagnostic mutant sets for compound characterization, which allowed us to interrogate more compounds and use smaller quantities. Certain drug efflux transporters can be dedicated to certain classes of drugs, such as PDR5 which has a documented specificity to steroid drugs48. Thus, although we only explored one genetic background for sensitization, it is possible to construct new yeast mutant collections using different genetic backgrounds tailored specifically for hypersensitivity to particular drug classes. In addition, while we selected a diagnostic pool of mutants specifically for genome-wide functional annotation, diagnostic pools with specific functional biases could be designed to investigate particular cellular processes or targets. Moreover, the diagnostic pool may be further reduced in size for greater multiplexing, as we found as few as 157 strains had equivalent predictive power as the entire non-essential collection of ~4900 strains.
One advantage of our approach is that we can functionally characterize compounds that do not show strong bioactivity. While bioactivity was predictive of our ability to make high-confidence predictions, it was not absolutely necessary. Pre-screening for bioactivity, which is a common approach8,17 can potentially exclude compounds with specific but possibly nonessential modes of action. For example, ~18% (270 of 1518) of the HCS compounds we identified for which we had bioactivity measures inhibited growth <20%. Indeed, weakly acting compounds targeting specific functions represent a starting point for chemical modifications to improve bioactivity.
Biological process target predictions derived from the global yeast genetic interaction network provides a roadmap, not only for other microorganisms (e.g. S. pombe, E. coli), but also for mammalian systems. Importantly, the construction of genetic interaction maps in human cell lines is possible, as is the mapping of chemical-genetic interactions49. Thus, the same approaches and predictive tools we implemented in yeast can be adapted and applied as a general strategy to map analogous chemical-genetic networks for human cells. More generally, combinatorial genetic and chemical-genetic approaches can be used to identify new drug leads that work synergistically to expand our understanding of druggable target space.
Online Methods
Constructing a genome-wide drug sensitive yeast deletion collection
Construction of the pdr1 pdr3 snq2 triple mutant is described in Andrusiak 201215. Briefly, PDR1 was deleted in the SGA query strain (Y7092) by replacement with the natMX antibiotic resistance marker, which provides resistance to the drug nourseothricin (NAT). To construct the pdr1 pdr3 double mutant, PDR3 was deleted in the pdr1 mutant by replacement with the K. lactis URA3 autotrophic marker, which permits cells to grow on synthetic media lacking uracil. The pdr1, pdr3, and snq2 single or double mutants were constructed by replacing the wild type gene with the natMX, K. lactis URA3, and K. lactis LEU2 markers, respectively. The natMX, Kl.URA3 and Kl.LEU2 markers were amplified from plasmids using primers designed with 50 base pairs of sequence homologous to regions upstream and downstream of the genes. PCR amplicons were transformed into the appropriate strains using lithium acetate and polyethylene glycol-based transformations. Deletion of the native gene and integration of the marker at the correct locus was confirmed using a series of PCR-based confirmations. Confirmation primers were designed specific to regions both flanking the integration site and internal to the inserted marker to interrogate both the full length of the inserted marker and the 5’ and 3’ boundaries.
The MATα pdr1::natMX pdr3::KI.URA3 snq2::KI.LEU2 (y13206) query strain carried the can1::STEpr-SP_his5 and lyp SGA reporters. STEpr-SP_his5 is an auxotrophic marker that allows only MATa cells to grow in the absence of histidine, while the can1 and lyp deletions allow haploid cells to grow in the presence of the drugs canavanine and thialysine, respectively. The MATα query strain was crossed to an ordered array of MATa xxx::kanMX deletion mutants and the resulting heterozygous diploids were transferred to media with reduced carbon and nitrogen to induce sporulation and the formation of haploid meiotic progeny. The resulting spores were transferred to synthetic media lacking histidine and containing canavanine and thialysine to select for the MATa meiotic progeny. Cells were then transferred to synthetic media lacking uracil and containing NAT to select for growth of cells carrying both the pdr3::KI.URA3 and pdr1::natMX deletions. Finally, these cells were transferred to synthetic media lacking uracil & leucine and containing G418 & NAT to select for the desired pdr1 pdr3 snq2 xxx mutants.
Assessing compound hit rate of sensitized yeast strains
The chemical sensitivity of deletion mutants was assessed using a high-throughput chemical growth inhibition halo assay. After growing WT, pdr1 pdr3 and pdr1 pdr3 snq1 mutant yeast strains overnight to saturation, cultures were standardized to an OD600 = 4.0 and 2 mL was added to a 50 mL stock of 2% YP (10 g/L yeast extract, 20 g/L peptone) + 2% galactose + 1% agar (YPGal). Seeded plates were prepared by pouring 10 mL of culture into NUNC square plates and drying for 10 minutes to facilitate compound absorption. Robotic pinning with the Biotec ADS384 was used to transfer 0.2 μL of each natural product to the seeded plates at a density of 88 compounds per plate; 440 diverse compounds (Supplementary Dataset 1) from the RIKEN NPDepo were evaluated in total. After incubating for 24 hours at 30 °C, plates were imaged and the visible areas of growth inhibition were measured using JMicrovision (Version 1.2.2. http://www.jmicrovision.com). A compound was deemed toxic if it generated an area of growth inhibition with a diameter greater than 1 mm. Thus, we assessed the number of compounds that perturbed growth (e.g. compound hits) of WT, pdr1 pdr3 and pdr1 pdr3 snq1 mutant strains.
The chemical-sensitivities of the top drug-sensitive deletion mutants identified from the adapted assay were confirmed by growing deletion strains in the presence of the tested drug (34.4 µM benomyl, 25 nM micafungin, or 1% DMSO) for 24 hours and recording the resulting optical density at 600 nm. Strains tested harbored deletions either in a wild-type background or in the drug-hypersensitive pdr1 pdr3 snq2 background. Values plotted are percentages calculated by dividing the OD600 measured after growth in DMSO by the OD600 measured after growth in the specific concentration of compound and multiplying by 100 (Fig 1. c–d). Y7092 was used as the WT control and the pdr1 pdr3 snq2 mutant was used as the drug hypersensitive control. (n = 3).
Defining the diagnostic gene set for optimized chemical genomic screens
A diagnostic set of 310 genes was selected by combining the output from two methods: a computational strategy and a manual selection. A set of 157 genes was selected by identifying functionally relevant genes using a computational approach called COMPRESS-GI (Deshpande et al. in preparation). Because genetic interaction profile similarity can be accurately measured using only a subset of the genome-wide profile, the COMPRESS-GI method selects genes to be included in a genetic interaction (and chemical-genetic) profile to maximize the agreement between pairwise gene similarities computed from the compressed profile and gene co-annotation information from the Gene Ontology. Selection of such a subset of genes is useful for our chemical genomics study because the reduced chemical-genetic profile for each compound is directly compared with the corresponding reduced genetic interaction profiles, which generates accurate compound-gene similarities based on a small set of mutants. The COMPRESS-GI algorithm is described and evaluated in depth elsewhere (Deshpande et al. in preparation).
In addition to the 157 genes selected with the computational approach, we also manually selected 236 genes. The logic for the manual method was to pick any single member of the same pathway/complex because members of the same pathway/complex possess similar genetic interactions. Hence, picking one gene from each pathway/complex should be sufficient to cover the genetic network space associated with all the genes in that pathway/complex. We applied 2-dimensional hierarchical clustering to cluster gene deletion mutants based on their genetic interaction profiles, and then manually selected strains that displayed rich genetic interaction profiles representative of each of the 17 functionally enriched cluster from the global genetic interaction profile similarity network (Costanzo et al., 201611) to generate a minimal subset of yeast deletion mutants that re-capitulated the majority of functional profiles observed in our reference map.
Both the COMPRESS-GI and manual gene selection methods were applied using a filtered, non-essential yeast genetic interaction dataset12 where strains observed to exhibit extreme read counts in barcode sequence (top/bottom 10%) were removed. Also, in cases where multiple different mutant alleles were available for the same gene, the allele with the highest number of genetic interactions (highest interaction degree) in its genetic interaction profile was chosen. We found 83 genes in common between the computational and manually-derived lists, suggesting that the two methods had good agreement with respect to which genes were informative. The union of genes from the two selection methods comprised the initial diagnostic strain set (Supplementary Dataset 2).
Pilot experiments using this diagnostic set (Supplementary Dataset 2, diagnostic pool version 1) revealed a number of mutants that still exhibited abnormally high or low barcode counts in all experiments. These were removed to generate a collection of 310 strains for the final version of the diagnostic strain set (Supplementary Dataset 2, diagnostic pool version 2).
Optimization of signal detection/sequencing parameters
Initial optimizations were conducted using a preliminary diagnostic pool of 491 strains. This pool of deletion mutants was constructed by pinning frozen 96-well glycerol stocks of each strain onto Nunc Omni Tray plates containing YPD + G418 solid media and incubating for 2 days at 30°C. Each plate was then flooded with 10 mL of YPD liquid media and a cell spreader was used to re-suspend grown colonies. The resulting cell suspensions were transferred to a 50 mL conical tube where glycerol was added to a 15% final concentration. Finally, the pool was adjusted to a final concentration of 50 OD600/mL by dilution or centrifugation and stored at −80 °C until required. To assay the mutant pool for drug-hypersensitivity, cells were thawed, counted using a haemocytometer, and diluted to seven different final inoculum densities (3727–58 cells / strain) in YP + 2% galactose in a 96-well flat-bottom plate. Cultures were then spiked with either 34.4 µM benomyl, 25 nM micafungin, or a 1% DMSO control. After growing for 18, 24, or 48 h at 30 °C, cells from each well were harvested by centrifugation. Genomic DNA was purified from the harvested cells by re-suspending in 125 μL of zymolyase buffer (1 mg/mL) and using the QIAextractor (Qiagen) as per manufacturer's instructions, with a 100 μL elution volume.
Barcodes were amplified from each of the wells using multiplex primers as described elsewhere36 for 20, 25, or 30 cycles. Samples were gel purified from 2% agarose and assessed for quality using the Kapa Illumina qPCR kit. Samples were sequenced at a loading concentration of 10 pM on an Illumina HiSeq2000 as a single uninterrupted read (“read through”). The 30 cycle samples were also sequenced using a “separated read” strategy, where the barcodes were read in a first sequencing step, while the multiplex tags were read after a second priming step. Output from “read-through” and “separated read” runs were then compared. The signal to noise was calculated by taking the mean CG score of the top 10 array genes divided by the standard deviation of all array genes CG scores. This was done for each drug, PCR cycle, cell density, culture combination.
Multiplex tag design and 768-plex primer selection
We designed one thousand 10 bp multiplex tags such that (1) the Levenshtein distance between any two tags was greater than 3, and (2) the tags were balanced in terms of nucleotide distribution. Condition (1) ensures that multiplex tags are maximally distinguishable even with a small number of sequencing errors while condition (2) ensures that the GC content and predicted melting point of all tags were within a small range. Because the space of multiplex tags is too large to exhaustively enumerate, we generated random multiplex tags and selected tags iteratively if both conditions were true. Primers containing the Illumina sequencing adapter, common priming site for the UPTAG barcode, and 1000 selected 10 bp multiplex tags were synthesized (Sigma), arrayed in 96-well plates. To assess amplification performance of the multiplex tags, we performed 1000 identical pooled growth experiments on the diagnostic strain pool under control conditions (DMSO). Samples were processed as described above and sequenced on an Illumina MiSeq lane (1000-plex). We used the count distribution to identify 8 plates (768 multiplex tags) with the most uniform distribution of read counts (Supplementary Fig. 13), and discarded plates containing multiplex tags with highly divergent reads counts. These 8 plates of multiplex tags with equivalent performance were used in all subsequent experiments (Supplementary Dataset 3).
To test the effects of multiplexing on the chemical genetic interaction signal, we selected a set of 768 compound conditions, including DMSO controls, known agents, and novel bioactive compounds from the RIKEN collection. For each assay we used the optimal pooled growth conditions defined above. We included a subset of compounds also screened in the Parsons et al. 2006 dataset as controls at every plexing level (96, 192, 384, and 768)7. We dosed the pooled cells at a level that inhibited growth by 20–50% compared to the DMSO control. Genomic DNA extraction, PCR, sample prep, and sequencing were performed as described above.
Screening the NPDepo/NCI/NIH/GSK collections
We performed our pooled growth assay with the diagnostic mutant collection under optimized conditions as described above. Excluding controls compounds, we performed two screens totaling 13524 conditions, which represented 13431 uniquely-named compounds. In the initial batch of compounds examined, we screened the first 9840 members of the growing RIKEN NPDepo, and in the second batch, we screened six publicly available plated libraries: the NCI Natural Product (117 compounds), Approved Oncology (101), Structural Diversity (1599), and Mechanistic Diversity (821) collections, the NIH Clinical Collection (720), and the GlaxoSmithKline kinase inhibitor collection (326). The NPDepo is maintained as 1 mg/mL stocks, and we screened it at a final concentration of 10 µg/mL, with the exception of a number of compounds that received additional lower dosing in a pilot experiment (Supplementary Dataset 4). All remaining collections were screened at 100 µM, with the exception of the NCI Mechanistic Diversity set (10 µM) (Supplementary Dataset 4). Selected compounds were re-screened at lower concentrations if the initial concentration resulted in severe growth inhibition. The diagnostic mutant pool was grown in 200 µL cultures in 96-well plates. Each plate had 88 test compounds, 4 control compounds, and 4 internal DMSO conditions, (Supplementary Fig. 14). Each lane consisted of 7 compound plates and one DMSO control plate, and every plate had 3 independent PCR replicates. For pairs of replicates of our control compounds, we measured Pearson correlation coefficients of 0.94, 0.95, 0.93, and 0.92 for our control compounds, respectively (Benomyl, Micafungin, MMS, Bortezomib). Thus, 3 replicates were sufficient to ensure high-quality, quantitative chemical genomic profiles. The primer set used to amplify each plate was shuffled for each replicate in such a way that each compound replicate would not use any single multiplex tag more than once. The primer set used to amplify the DMSO plate was different for each lane. The control compounds give very distinct CG profiles and were used to ensure proper plate orientation at all steps of the process. Culture OD was measured at 0, 24, and 48 h, and growth at 24 h relative to the DMSO control was used as a measure for bioactivity.
Following growth, genomic DNA was extracted as described above. The genomic extractions for each plate were amplified in triplicate using three unique multiplex primer plates (3 technical replicates). We used 768-plexing per lane, which means each sequencing lane contained PCR amplified barcodes from eight 96-well plates. We ensured each of the multiplex primer plates were used to amplify the DMSO plates allowing us to detect and remedy any potential multiplex primer biases following sequencing. Following PCR, samples were pooled first by plate, then by lane. The “per lane” samples were purified by 2% agarose gel and the product quantified by qPCR as described above. All samples were run at a loading concentration of 10 pM as single-end, 50 bp reads on an Illumina Hiseq2000.
Description of compound collections
RIKEN NPDepo
The RIKEN Natural Products Depository (NPDepo) is a public depository of small molecules. Currently, the NPDepo chemical library contains 39,200 pure compounds, half of which are natural products and their derivatives51.
Each of the remaining collections are publicly available and can be requested at the sites listed below.
NIH-Clinical collection
NCI-Structural diversity collection
https://www.dtp.nci.nih.gov/branches/dscb/div2_explanation.html
NCI-Mechanistic diversity collection
https://www.dtp.nci.nih.gov/branches/dscb/mechanisticII_explanation.html
NCI-Oncology collection
https://www.dtp.nci.nih.gov/branches/dscb/oncology_drugset_explanation.html
NCI-Natural products collection
https://www.dtp.nci.nih.gov/branches/dscb/natprod_explanation2.html
GSK-Kinase inhibitor collection
Computing molecular descriptors for all screened compounds
SMILES and InChI string representations of all molecules were generated using the OpenBabel cheminformatics toolkit52 (http://openbabel.org) and its python wrapper, pybel53. All molecular descriptors (column J through the last column) were calculated using PaDEL-Descriptor54, a wrapper for the Chemistry Development Kit cheminformatics toolkit55.
Predicting compounds’ modes of action based on chemical-genetic and genetic interaction profiles
Full descriptions of the methods for scoring chemical-genetic interactions from next-generation sequencing data and predicting targeted biological processes via integration of genetic and chemical genetic interaction profiles are presented in Supplementary Note.
To assess the performance of predictions, we identified known compounds with described modes-of-action present in our high-confidence prediction set (“gold standard compounds”). If the predicted process was functionally related to the known mode-of-action, we considered this a successful prediction (Supplementary Table 3).
Characterizing the contribution of strains with high and low chemical-genetic interaction degree to process-level target prediction
We also characterized the contribution of the highest and lowest-degree strains to process-level predictions, this time by removing the 15% highest or lowest-degree strains before predicting process-level targets. The degree of a strain was defined as the number of interactions with an absolute CG score ≥ 2.5 it possessed across the RIKEN subset of compound-derived chemical-genetic interaction profiles (no DMSO or resampled profiles). After removing 40 of the highest or 41 of the lowest-degree strains (out of the 275 strains that overlapped with the S. cerevisiae genetic interaction network array strains, (Supplementary Dataset 9), process-level targets were predicted as described in “Predicting compounds’ modes of action based on chemical-genetic and genetic interaction profiles” and “Assessing the false discovery rate of process target predictions” in Supplementary Note. Comparisons regarding the number and identity of discovered compounds, and the identity of their predictions, were performed to determine the roles that high and low chemical-genetic interaction degree strains played in predicting process-level targets.
While the removal of low-degree strains had little effect on the identity of discovered compounds and their predictions, the removal of high-degree strains had noticeable effects. The “no-low-degree” profiles led to discovery of 927 bioprocess-level target predictions, 794 of which matched the original RIKEN “all-strain” predictions (94% of the 848 original RIKEN high confidence set, or HCS) (Supplementary Dataset 9). In contrast, the “no-high-degree” profiles led to the discovery of only 667 high confidence bioprocess-level target predictions, most of which overlapped with the RIKEN HCS (537 compounds, or 63% of the RIKEN HCS). In addition, the predictions derived from “no-low-degree” profiles tended to match the predictions of in the RIKEN HCS (602/794, or 76%, of “all-/no-low-degree” compounds shared predictions with Jaccard ≥ 0.25), while the predictions derived from “no-high-degree” profiles were less consistent (168/667, or 31%, of “all/no-high-degree” compounds shared predictions with Jaccard ≥ 0.25).
The importance of high-degree strains to bioprocess-level predictions was further confirmed by examining the identities of the predicted processes. While removing high-degree strains does not destroy the performance of bioprocess-level predictions, it does substantially change the distribution of the most frequently-predicted bioprocesses and reduce prediction accuracy for some well-characterized compounds. After removing high-degree strains, the top predicted bioprocess by far was “spindle assembly,” followed by other microtubule and cell cycle-related processes, and finally, bioprocesses related to localization, pH and ATP, glycosylation, and DNA damage/repair (Supplementary Dataset 9). For three well-characterized compounds, the removal of high-degree strains substantially reduced prediction specificity for tunicamycin, altered predictions of rank 3 and below for benomyl, and left the predictions for MMS essentially unchanged (Supplementary Dataset 9). In contrast, removing low-degree strains had little effect on either the distribution of process-level predictions in the high-confidence set or the highest-confidence predictions for benomyl, MMS, and tunicamycin.
Characterizing the respective contribution of negative and positive interactions to process-level target prediction
Using the RIKEN NPDepo high-confidence set of compounds, we characterized the contribution of positive and negative chemical-genetic interactions to our process-level predictions. First, chemical-genetic interaction profiles containing either only positive or only negative interaction scores were generated. Process-level targets were then predicted using these “positive-only” or “negative-only” profiles as described in “Predicting compounds’ modes of action based on chemical-genetic and genetic interaction profiles” and “Assessing the false discovery rate of process target predictions” in Supplementary Note. We then compared the number and identity of the compounds discovered, and the identity of their predictions, between “positive-only,” “negative-only,” and “all-interaction” prediction sets to determine which side(s) of the chemical-genetics interaction profiles were important for predicting perturbed processes.
Two schemes were employed to generate the “positive-only” and “negative-only” chemical-genetic interaction profiles and their subsequent process-level predictions. Scheme 1 profiles showed how all negative and all positive interaction scores contribute to process-level predictions, and scheme 2 profiles accounted for biases that could have occurred due to differences in the number of positive vs. negative interactions in the scheme 1 profiles. To generate “negative-only” profiles under scheme 1, the positive scores in all compounds, DMSO control profiles, and resampled profiles were set to zero; conversely, “positive-only” profiles under this scheme were generated by setting all negative scores to zero. To generate the “positive-only” and “negative-only” profiles under scheme 2, an equal number of scores with absolute value ≥ 1 were selected from the extreme positive or negative ends, respectively, for each compound, DMSO, and resampled profile.
“Negative-only” and “positive-only” chemical-genetic interaction profiles led to the identification of a substantially different sets of “high-confidence” compounds (at least one prediction with FDR ≤ 25%), with the “negative-only” profiles reproducing the “all-interactions” high confidence set much better than did the “positive-only” profiles. Both high confidence sets derived from “negative-only” profiles from scheme 1 (all scores) and scheme 2 (equal number of positive vs. negative scores) possessed roughly the same number and identity of compounds when compared to the “all-interaction” high confidence set (Supplementary Dataset 10). Specifically, 85% (723/848) and 81% (689/848) of the high confidence compounds identified using all interactions were discovered using scheme 1 “negative-only” profiles (“negative-all/all” comparison) and scheme 2 “negative-only” profiles (“negative-equal/all” comparison), respectively. While the high confidence set derived from scheme 1 “positive-only” profiles was similar in size to the “all-interactions” high confidence set, the compounds in both scheme 1 and scheme 2 “positive-only” high confidence sets had much lower overlap with the “all-interactions” high confidence set (345/848, or 41% – “positive-all/all” comparison, and 183/848, or 22% – “positive-equal/all” comparison, respectively).
In addition to driving the discovery of the same compounds that were in the “all-interactions” high confidence set, negative chemical-genetic interactions also drove the discovery of the same predictions for these compounds. For example, 68% (494/723) of the “negative-all/all” co-identified compounds and 47% (326/689) the “negative-equal/all” co-identified compounds had a Jaccard coefficient of ≥ 0.25 for their predictions. In contrast, only 17% (58/345) of the “all/positive-all” and 3% (6/183) of the “all/positive-equal” co-identified compounds met this criterion for the similarity of their predictions, suggesting that even for compounds where predictions were made, the predicted modes of action were largely different. From this evidence, negative chemical-genetic interactions are clearly the primary driver of genetic interaction-based target predictions.
In addition, two lines of evidence suggest that the predictions made using only positive chemical-genetic interactions are of lower quality than those derived from all or only negative interactions. First, we observed that the predictions from positive chemical-genetic interactions were overwhelmingly biased toward GO terms related to RNA splicing/processing and cell cycle/mitosis, while those from all or only negative interactions were more diverse (GO terms related to cellular localization, chromatin organization and transcription, cell wall, vesicle-mediated transport, pH regulation, protein degradation, microtubules and cytoskeleton, etc., in addition to cell cycle/mitosis) (Supplementary Dataset 10). Second, we observed that in the set of predictions derived from only positive interactions, three well-characterized compounds (benomyl, MMS, tunicamycin), whose known mechanisms of action are well-captured by process-level predictions based on either all or only negative interactions, both 1) failed to make the high confidence compound list and 2) did not show predictions consistent with known mechanisms (Supplementary Dataset 10).
Visualizing the relationship between compound bioactivity and inclusion into the high confidence set
We assessed the fraction of compounds in the high confidence set as a function of bioactivity, which can also be thought of as the probability that a compound will be in the high-confidence set given its bioactivity. The bioactivity (percent growth compared to DMSO) and high confidence set status (true/false, respectively set to 1/0 for analysis) for each compound were extracted from Supplementary Dataset 4. A loess curve was then fit through the 1/0 high-confidence status values with respect to the bioactivity values, using a span of 0.1 and least-squares fitting with a polynomial degree of 2. The curve on the plot was drawn at points 2.5 units apart, starting at the smallest observed bioactivity value (Supplementary Fig. 6).
Determining functional distributions of compound collections
Generating the background set of chemical genomic profiles
To account for biases in the distribution of process predictions introduced by our discovery pipeline, we generated a set of “background” chemical genomic profiles. Each background profile was a high-signal GI profile with noise added based on the variance of each strain across all GI profiles (Gaussian, µ=0, σ = 2 × σstrain). Each of these 4515 profiles (3 for each of 1505 GI profiles) simulated a compound that targets one gene. This enabled the estimation of any functional biases introduced by our GI-based discovery pipeline.
Computing distributions of process predictions for each compound class
We calculated the proportion of each compound class that was predicted to each process term. (Supplementary Dataset 20). Those proportions were then compared to the proportion of the background profiles predicted to each process using a proportion test in R (Supplementary Dataset 21). To sort from the most significant enrichment to the most significant depletion compared to the background, p-values from the proportion test were modified such that p-values from proportions greater than the background ranged from 0 to 1, and p-values from proportions smaller than the background ranged from 2 to 1. Using a ranksum analysis with the modified p-values as the input, we determined, for each class, if processes that mapped to each functional neighborhood were predicted more or less frequently than in the background set. Rank-sum p-values were Bonferroni-corrected and visualized as a heatmap (Fig. 4b).
Compound diversity sets for functional neighborhoods
We assigned all the compounds associated with a specific functional neighborhood to a single cluster and split up the cluster recursively to form clusters of more similar compounds. At any recursive step, we determined the cluster with the lowest average within-cluster chemical genomic similarity and divided the cluster into two new clusters using K-means clustering. We stopped generating new clusters right before our algorithm would generate at least two individual clusters exceeding our predefined limit for the maximum average between-cluster chemical genomic similarity (cosine similarity of 0.3). We repeated the algorithm 1000 times for each neighborhood and selected, from each cluster, the compound with the strongest prediction as a candidate for our diversity set. We finally sorted all our candidates across all the repetitions from the most frequent to the least frequently occurring. To define the compound diversity set, we selected from this ranked list as many top candidates as were needed to cover all the clusters in at least 50% of the repetitions.
Comparison with other chemical genomic datasets
An independent set of whole-genome chemical genomic screens have been performed previously by Lee et al. and Hoepfner et al.14,15. These studies interrogated 3,239 and 2,923 compounds, respectively, and they were performed using both a heterozygous and homozygous diploid deletion mutant profiling platform. The homozygous diploid deletion mutant profiling platform is comparable to the chemical-genetic analysis we carried out with haploid deletion mutants. Our study shares 145 compounds in common with the Lee et al. study and 31 compounds in common with the Hoepfner et al. study. In particular, all three studies possessed an overlap of 9 compounds.
Comparisons were made between our chemical-genetic interaction scores with the Hoepfner et al. median absolute deviation logarithmic scores, and with the Lee et al. fitness defect scores (multiplied by -1), such that the chemical-genetic interaction profiles were restricted to the 277 genes common between the three studies. For the nine shared compounds (Supplementary Table 1), our study shows an average Pearson correlation coefficient (PCC) of 0.29 with Lee et al., and 0.38 with Hoepfner et al. whereas Lee et al. and Hoepfner et al. show a PCC of 0.22. Thus, our study shows significant agreement with both the Lee et al., study (p-value: 5×10−7) and the Hoepfner et al. study (p-value: < 1×10−8).
We also compared the members of the compound diversity sets derived from our RIKEN and Clinical screens to the major chemical-genetic signatures defined in Lee et al. and found favorable overlap of the functional space occupied by compounds from both studies. After computing PCC between each diversity set compound and each compound from Lee et al. that was annotated to a major signature, we observed that all 45 major Lee et al. signatures contained at least one compound that was significantly similar to a compound in both diversity sets (PCC > 0.2, one-sided test, p-values obtained by shuffling the profile gene labels 10,000 times followed by Benjamini-Hochberg correction, FDR < 0.05) and that most of the compounds in the RIKEN and Clinical diversity sets contributed to this overlap (123/130 unique RIKEN and 187/214 Clinical compounds) (Supplementary Fig. 15a). When applying a more stringent PCC threshold (PCC > 0.4), only 18 and 12 (out of 45) Lee et al. major signatures are covered by 32 and 39 compounds from the RIKEN and Clinical diversity sets, respectively.
In addition, we mapped the Lee et al. major signatures to our bioprocesses and found that many of these mappings agree functionally (Supplementary Fig. 15b). After computing PCC between the profiles of each high confidence compound and each compound from Lee et al. that was annotated to a major signature, we annotated each correlation > 0.3 to a major signature/bioprocess pair (the bioprocess annotation for each high confidence compound was based on its best process prediction). For each major signature/bioprocess pair, we then counted the number of unique Lee et al. and high confidence compounds, respectively, that contributed to these correlations. We normalized these counts by the size of their respective major signature or bioprocess and multiplied the resulting fractions together to derive a confidence score that deemphasizes major signature/bioprocess pairs for which a very small number of compounds annotated to the major signature (or bioprocess) is responsible for most of the correlations to the compounds in the bioprocess (or major signature). A table that maps each Lee et al. major signature to its most confident bioprocess is provided (Supplementary Dataset 22), as is a table that maps each Lee et al. signature to any bioprocess with which it shared at least one profile correlation > 0.3 (Supplementary Dataset 22). Both tables are sorted by confidence in descending order. Agreement between the Lee et al. major signatures and our bioprocess annotations was encouraging; specifically, Golgi (Lee et al.) mapped to Vesicle traffic (this study), ubiquinone biosynthesis & proteasome (Lee et al.) to Protein Degradation (this study), ergosterol depletion effects on membrane (Lee et al.) to Metabolism & Fatty Acid Biosynthesis (this study), and DNA damage response (Lee et al.) to DNA Replication & Repair (this study). Overall 43/45 major chemical-genetic signatures possessed at least one compound with PCC > 0.3 to a compound in our study and therefore could be mapped to a bioprocess; however, mappings derived from a very small number of compounds in either member of the pair should be interpreted with more caution.
Identifying structural motifs contributing to functional enrichments
To identify structural motifs that drove specific functional neighborhood enrichments, we performed discriminative molecular substructure mining on the RIKEN HCS set of compounds using the MoSS tool56. Using the proportion of each compound class that was predicted to each process term (see “Computing distributions of process predictions for each compound class”), we selected only process terms that had a significantly higher proportion of predictions in at least one compound class versus the GI background (proportion test in R, Bonferroni-corrected). Then, for each process term, we identified substructures that occurred at least twice as frequently in compounds with high confidence predictions to that process term (the “active” set) versus compounds that did not have high confidence predictions to that term (the “inactive” set). This discriminative mining was performed twice per process term: once by drawing the inactive set of compounds from all screened compounds in the RIKEN NPDepo, and once by drawing the inactive set from all NPDepo compounds in the HCS. By selecting the minimum of these two enrichments, we sought to control for bias in the distribution of substructures in the inactive compounds. The information about the substructures and their enrichments was compiled across all experiments. The final output is a table of substructures that show enrichment for a particular functional category (Supplementary Dataset 15).
Localization enrichments
We sought to determine if the compounds in particular collections exhibited bias in the localization of their targets. Using the proportion of each compound class that was predicted to each process term (see “Computing distributions of process predictions for each compound class”), we selected process terms that had significantly higher (enriched) and lower (depleted) proportions of predictions versus the GI background (proportion test in R, Bonferroni-corrected). For each compound collection, two gene lists were assembled, each representing the union of the genes annotated to either enriched or depleted (pbonf ≤ 0.05) process terms.
A hypergeometric test was performed to determine which of these gene lists were enriched for genes annotated to specific cellular components. P-values were Bonferroni-corrected. Gene annotations to cellular compartments were obtained from Huh et al. 200357, Koh et al. 201558, and the yeast GO slim cellular compartment annotations (http://www.yeastgenome.org/). The background set of genes for all hypergeometric tests was the set of 1499 query genes with GO process annotations from the high-degree genetic interaction dataset.
Flow cytometry based global validations of targeted processes
67 compounds with process target predictions mapping to G1-phase arrest, S-phase arrest, or G2-phase arrest flow cytometry phenotypes (based on Yu et al. 200635) were selected from the high confidence set. Compounds that ultimately mapped to multiple cell cycle phenotypes via their process target predictions were removed from consideration. For each cell cycle phenotype, the 50 compounds with the highest overlap of 1) gene targets driving the process prediction that mapped to the phenotype and 2) the genes directly annotated to the phenotype (Yu et al., 2006) were selected. Compounds were then manually selected from these lists based on their bioactivity, as compounds with higher bioactivity were assumed more likely to induce a cell cycle phenotype.
Cultures of the control strain (y13206) were grown to early log phase (0.4 OD) in YPGal (1% yeast extract, 2% peptone, 2% galactose). 250 µL per well of the starting culture was aliquoted into a 96-well block. The cultures were treated with 10 µg/mL of each compound and incubated at 30 °C for 2–3 h. We included the compounds hydroxyurea, MMS, nocodazole, and tunicamycin as controls known to arrest cell cycle in G1, S, G2 and post-G2 respectively. From each culture, 200 µl was transferred into a new 96 well plate, pelleted at 2000 rpm for 5 min. Pellets were resuspended in 20 µL of 50 mM Tris-Cl (pH 8.0), 50 mM EDTA buffer. 160 µl of cold 99% EtOH was added to the wells. Cells were pelleted at 4000 rpm for 2 min at RT, resuspended in RNAse A solution (50 mM Tris-Cl pH 8.0, 0.4 mg/mL RNAseA), and incubated for 2 h at 37 °C. Cells were pelleted at 4000 rpm for 2 min at RT, and 50 µL of proteinase K solution was added (50 mM Tris-Cl pH 7.2, 200 mM NaCl, 78 mM MgCl2, filter sterilized). The cells were then incubated for 50–60 min at 50 °C. Cells were pelleted at 4000 rpm for 2 min at RT, and resuspended in 55 µL of FACS buffer (200 mM Tris-Cl pH 7.5, 200 mM NaCl, 78 mM MgCl2, filter sterilized). In a new 96 well plate, 180 µL of SYBR Green solution (2X SYBR Green, 50 mM TrisCl pH 7.2) was added to each well. 20 µL of fixed cells from the previous step was added. The plate was then processed via high-throughput flow cytometry as described in Yu et al., 2006. The voltage of the green channel was adjusted so that on the linear scale the 1C peak and the 2C peak were well spaced, the 1C peak was away from the vertical axis. The FSC-A vs FL1-A was used to gate out aggregates and dead cells. The final histograms have FL1-A on the x-axis (area of the green channel).
Cell cycle phenotypes were called by drawing thresholds based on 46 control DMSO profiles, on either the percent of cells in S phase (%S) or the ratio between the percentages of cells in G1 (1C peak) vs. G2 (2C peak) phase (G1/G2 ratio). Specifically, the mean and standard deviation were computed for both the %S and the G1/G2 ratio in the DMSO control samples. These values were used to convert the corresponding values from the treatment compounds into z-scores. A phenotype was called if the z-scores of both replicates passed the appropriate z-score threshold of either 1.5 or –1.5. The specific thresholds for phenotypes calls were as follows: a 1C phenotype was called if G1/G2 ratio > 1.196; a 2C phenotype was called if G1/G2 ratio < 0.809; and an S phenotype was called if %S > 19.5%.
Enrichments and p-values were computed empirically by shuffling the phenotypes associated with the compounds and counting the number of cell cycle phenotypes associated with each prediction in the shuffled data (100,000 randomizations). Compound identities were preserved during the randomization, such that both replicates of a compound were associated with the same cell cycle phenotype prediction after each randomization. Enrichments were computed by dividing the number of calls observed from the real data by the average expected number of calls for each combination of predicted and observed phenotype (averaged over all compound-predicted phenotype randomizations). In a similar fashion, empirical p-values were computed for each combination of predicted and observed phenotype by counting the fraction of randomizations that produced the same or larger number of calls.
Multi-parameter validation of cell wall targeting compounds
For the adenylate kinase (AK) cell leakage assay, an overnight culture of the drug hypersensitive yeast strain (y13206) in log phase was harvested and washed twice with fresh YPGal medium. The final pellet was resuspended in 1 mL fresh YPGal. Fifty microliters of cell suspension (~1x106 cells), 1% DMSO, 10 µg/mL of each test compound was added in individual wells of 96-well culture plate containing YPGal medium to a final volume of 100 mL, mixed by pipetting and incubated at 25 °C for 4 h (n=3). The plate was equilibrated to room temperature for 30 min and the contents were transferred into a luminescence compatible 96-well white-walled plate. Next, 100 μL of ToxiLight AK reagent (Lonza) was added to each well and incubated at room temperature for 30 min, and luminescence was measured with a Wallace ARVO SX 1420 Multilabel Counter (Perkin Elmer Life Sciences). Hit compounds resulted in more than 20000 units. Cells were stained with the glucan stain aniline blue and the chitin stain calcofluor white as described previously36, and hits assessed by irregular glucan or chitin staining detected by eye. Treated cells were analyzed by high-dimensional morphometric analysis (CalMorph) as described previously (n=5)59. A neck width and morphological noise (heterogeneity) was determined as described previously36.
Zymolyase sensitivity assay
Yeast cells (y13206) were grown in YPGal until log phase (~4x107 cells/mL), and 50 mL of aliquot was transferred into fresh 150 mL YPGal containing test compounds in 96-well microtiter plate (10 or 40 mg/mL for test compounds, as for controls: 2.5 mg/ml for echinocandin B, 30 mM for hydroxyurea, 1% for DMSO). The cell-containing plate was incubated at 25 °C for 4 h with shaking. After incubation, cells were washed twice with 10 mM Tris-HCl (pH7.5), and resuspended to zymolyase solution (0.94 mg/mL of Zymolyase 100T (Seikagaku) in 10 mM Tris-HCl (pH 7.5)). Cell suspensions were incubated at 30 °C, and OD600 values were measured for 1 h after the addition of zymolyase with plate reader (SPECTRAmax plus384, Molecular devices). In each sample, OD600 values were standardized at time 0 to equal 1 (or 100%).
Cell cycle analysis of NPD5925
Y13206 cells were grown to mid-log phase in YPD, and a sample of this asynchronous population was saved for later analysis. The cells were treated with alpha factor and incubated for 2.5 hours at 30 °C, and a sample of the alpha factor-arrested population was saved for later analysis. Pronase and test compounds were added to the remaining arrested population. We tested DMSO (2%), hydroxyurea (0.2 M), MMS (0.03%), and NPD5925 (20 µg/mL). The treated cells were incubated for 1 h and then prepared and analyzed via flow cytometry as described above.
Tubulin inhibition assay and assessing predictive power
We carried out in vitro tubulin polymerization assays using the cytoskeleton fluorescent based porcine tubulin polymerization assay (Cytoskeleton Inc) following manufacturer specifications. We used 10 µg/mL of test compound for each assay. We tested the control compounds nocodazole, paclitaxel, and the predicted tubulin targeted compound NPD2784 versus a DMSO solvent control.
Identifying compounds with multiple, unique mechanisms of action
We devised an algorithm to prioritize compounds from the RIKEN HCS whose chemical genetic (CG) interaction profiles appeared to be a combination of multiple, diverse genetic interaction (GI) profiles, indicating that they exert their effects via multiple, unique mechanisms of action. For a compound, we first constructed profiles reflecting the mean contribution of each strain in its CG profile to each of its process target (PT) predictions. Then, the initial cluster of “mean contribution profiles” was seeded with the profile from the highest confidence PT prediction. To complete the clustering, the mean contribution profiles from progressively lower-confidence PT predictions were either added to an existing cluster (if they possessed a Pearson correlation coefficient of ≥ 0.5 with a profile in that cluster) or used to seed a new cluster. Compounds were prioritized if they possessed two clusters of mean contribution profiles with very low average similarity between them, suggesting that two distinct signals in the GI network contributed to the signal observed in their CG profiles. A set of contribution profiles was generated for a compound and one of its PT predictions by taking the element-wise product of the compound’s CG profile and the L2-normalized GI profile of each gene that drove the PT prediction (genes with genetic target score ≥ 2 and were annotated to the PT, which are shown in columns “driver_common” and “driver_score” in Supplementary Dataset 8. The “mean contribution profile” for one compound and PT prediction was calculated as the strain-wise mean across all of the contribution profiles associated with that compound and one PT prediction. GI profiles were from the set of high-signal genetic interaction profiles.
Staining of cells with NPD5925
Log phase yeast cells (y13206) were fixed with 3.7% formaldehyde solution. The fixed-cell suspension was centrifuged to make cells a pellet, and the pellet was mixed with the same volume of NPD5925 (1 mg/mL) and incubated at 25 °C for 30 min. Cells were washed twice with phosphate-buffered saline (PBS), and a small cell aliquot was mixed with mounting solution (90% glycerol, 9.975% PBS, 0.025% 0.1 N NaOH) containing p-phenylenediamine (1 mg/mL) and 4',6-diamidino-2-phenylindole (DAPI, 0.7 mg/mL). A prepared specimen was observed by fluorescent microscope (Axioimager M1, Carl Zeiss) with regular rhodamine or DAPI filter sets (Carl Zeiss). An intensity profile was extracted from cell images by ImageJ (http://imagej.nih.gov/ij/).
Adenylate kinase (AK) assay of NPD5925
An overnight culture of yeast strain (y13206) in log phase was harvested and washed twice with fresh YPGal medium and the final pellet was resuspended in 1 mL fresh YPGal. Fifty microliters of cell suspension (~1x106 cells), 1% DMSO, 30 mM hydroxyurea, 20 mg/ml Echinocandin B, and 40 mg/mL test compounds were added in individual wells of 96-well culture plate containing YPGal medium to a final volume of 100 ml, mixed by pipetting and incubated at 25 °C for 4 h. The plate was equilibrated to room temperature for 30 min and the contents were transferred into a luminescence compatible 96-well white-walled plate. Next, 100 μL of ToxiLight AK reagent (Lonza) was added to each well and incubated at room temperature for 30 min, and luminescence was measured with a Wallace ARVO SX 1420 Multilabel Counter (Perkin Elmer Life Sciences).
Assessing potential targets of NPD5925 for pleiotropy between DNA and cell wall processes
The chemical genetic interaction profile was compared against high confidence genetic interaction profiles using a genetic interaction normalized cosine score (genetic target score, Eqn. 1 in Supplementary Note). The top ten high confidence genes were displayed alongside the chemical genetic interaction profile.
For each of the high confidence GO process predictions for NPD5925, the genetic interaction profile of the drivers of that GO process prediction, high confidence genes with a genetic target score above 2 annotated to the enriched GO process, were combined using the following to form a GO process specific importance profile,
where k is the number of genes driving the GO process prediction. A GO process driven chemical genetic interaction profile is then derived with:
The GO process driven chemical genetic interaction profile is then compared against genetic interaction profiles in high confidence using the genetic target score, and the top ten high confidence genes were displayed alongside the GO process driven chemical genetic interaction profile.
Code availability
All code used to generate these data are freely available via http://github.com/csbio/.
Data availability
We have established a database that hosts all the barcode sequence data at http://mosaic.cs.umn.edu/. All high confidence chemical genomic data and associated compound information has been deposited to PubChem (Data Source ID 15567). All data generated or analyzed during this study are included in this published article (and its supplementary information files) or are available from the corresponding author on reasonable request.
Supplementary Material
1
Suppl Dataset 16
Suppl Dataset 17
Suppl Dataset 18
Suppl Dataset 19
Suppl Dataset 2
Suppl Dataset 20
Suppl Dataset 21
Suppl Dataset 22
Suppl Dataset 3
Suppl Dataset 4
2
Suppl Dataset 5
Suppl Dataset 6
Suppl Dataset 7
Suppl Dataset 8
Suppl Dataset 9
Suppl Dataset 1
Suppl Dataset 10
Suppl Dataset 11
Suppl Dataset 12
Suppl Dataset 13
Suppl Dataset 14
Suppl Dataset 15
Acknowledgments
This work was supported by RIKEN Strategic Programs for R&D. J.S.P. and S.C.L were funded by a RIKEN Foreign Postdoctoral Fellowship. S.W.S. is supported by an NSF Graduate Research Fellowship (00039202), an NIH Biotechnology training grant (T32GM008347), and a one-year BICB fellowship from the University of Minnesota. H.O. is a research fellow of the Japan Society for the Promotion of Science. R.D., J.N., E.W., and C.L.M. are supported by National Institutes of Health Grants 1R01HG005084-01A1, 1R01GM104975-01, and R01HG005853 and National Science Foundation Grant DBI 0953881. C.B. and Y.O. are supported by JSPS KAKENHI Grant Numbers 15H04483. C.B. and B.A. were supported by the Canadian Institutes of Health Research, FDN-143264 and FDN-143265, respectively. C.L.M, C.B., M.C., J.L., and B.A. are supported by the Canadian Institute for Advanced Research Genetic Networks Program. Y.O. is supported by Ministry of Education, Culture, Sports, Science and Technology, Japan Grant for Scientific Research 24370002 and JSPS KAKENHI Grant Numbers 15H04402. M.Y. is supported by JSPS KAKENHI Grant Number 26221204. A.B. is supported by a Lewis Sigler fellowship at Princeton University. G.W.B. and N.P.T are supported by Canadian Cancer Society Research Institute impact grant 702310. K.S. is supported by CREST from JST. We thank Astellas Pharma Inc. (Tokyo, Japan) for their kind gift of Micafungin. We thank Tamio Saito for help with NPDepo compound access. Sequencing was provided by RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Genome Network Analysis Support Facility (GeNAS) RIKEN CLST and the University of Chicago.
Footnotes
URL: http://mosaic.cs.umn.edu/
Pubchem Data Source ID: 15567
Author contributions
C.B., M.Y., J.S.P. and S.C.L. conceived the project. J.S.P., S.C.L., R.D., and S.W.S. designed the chemical genomic screens. J.S.P., S.C.L., J.M.B., R.O. M.Y., Y.Y., E. D-A. performed the chemical genomic experiments. C.L.M., R.D., S.W.S., J.N., H.S., and E.W. designed the analysis software and performed analysis. G.W.B., N.P.T., and S.C.L. performed cell cycle experiments. J.P., K.A., M.A.L., and A.B. designed the sensitized yeast. Y.O., H.O., A.A.G., K.K. performed validation of cell wall targeting compounds. R.D. and M.C. designed the diagnostic mutant collection. H.O. and H.H. provided and curated NPDepo compounds. K.S. and B.A. provided analysis. J.V. edited the figures of the manuscript. The manuscript was written by J.S.P., S.C.L., C.L.M., and C.B. with input and editing from all authors.
Competing financial interests
The authors declare no competing financial interests.
References
Full text links
Read article at publisher's site: https://doi.org/10.1038/nchembio.2436
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc6056180?pdf=render
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Article citations
An automated positive selection screen in yeast provides support for boron-containing compounds as inhibitors of SARS-CoV-2 main protease.
Microbiol Spectr, 12(10):e0124924, 20 Aug 2024
Cited by: 0 articles | PMID: 39162260 | PMCID: PMC11448104
Mitochondrial genome structure and composition in 70 fishes: a key resource for fisheries management in the South Atlantic.
BMC Genomics, 25(1):215, 27 Feb 2024
Cited by: 1 article | PMID: 38413941 | PMCID: PMC10898094
From beer to breadboards: yeast as a force for biological innovation.
Genome Biol, 25(1):10, 04 Jan 2024
Cited by: 0 articles | PMID: 38178179 | PMCID: PMC10768129
Review Free full text in Europe PMC
A scalable platform for efficient CRISPR-Cas9 chemical-genetic screens of DNA damage-inducing compounds.
Sci Rep, 14(1):2508, 30 Jan 2024
Cited by: 0 articles | PMID: 38291084 | PMCID: PMC10828508
Soft-metal(loid)s induce protein aggregation in Escherichia coli.
Front Microbiol, 14:1281058, 22 Nov 2023
Cited by: 2 articles | PMID: 38075883 | PMCID: PMC10699150
Go to all (51) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Predicting bioprocess targets of chemical compounds through integration of chemical-genetic and genetic interactions.
PLoS Comput Biol, 14(10):e1006532, 30 Oct 2018
Cited by: 8 articles | PMID: 30376562 | PMCID: PMC6226211
Systematic Mapping of Chemical-Genetic Interactions in Saccharomyces cerevisiae.
Cold Spring Harb Protoc, 2016(9), 01 Sep 2016
Cited by: 2 articles | PMID: 27587783
Using BEAN-counter to quantify genetic interactions from multiplexed barcode sequencing experiments.
Nat Protoc, 14(2):415-440, 01 Feb 2019
Cited by: 12 articles | PMID: 30635653 | PMCID: PMC6818255
Cheminformatics approaches to analyze diversity in compound screening libraries.
Curr Opin Chem Biol, 14(3):325-330, 22 Apr 2010
Cited by: 36 articles | PMID: 20457001
Review
Funding
Funders who supported this work.
NHGRI NIH HHS (2)
Grant ID: R01 HG005084
Grant ID: R01 HG005853
NIGMS NIH HHS (2)
Grant ID: R01 GM104975
Grant ID: T32 GM008347