STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Szklarczyk D; Gable AL; Lyon D; Junge A; Wyder S; Huerta-Cepas J; Simonovic M; Doncheva NT; Morris JH; Bork P; Jensen LJ; Mering CV

doi:10.1093/nar/gky1131

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Affiliations

1. Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland.
Authors
Szklarczyk D¹
Gable AL¹
Lyon D¹
Wyder S¹
Simonovic M¹
Mering CV¹
(6 authors)
2. Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark.
Authors
Junge A²
Doncheva NT²
Jensen LJ²
(3 authors)
3. Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), 28223 Madrid, Spain.
Authors
Huerta-Cepas J³
(1 author)
4. Resource on Biocomputing, Visualization, and Informatics, University of California, San Francisco, CA 94158-2517, USA.
Authors
Morris JH⁴
(1 author)
5. Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.
Authors
Bork P⁵
(1 author)

ORCIDs linked to this article

Show all (9)

Nucleic Acids Research, 01 Jan 2019, 47(D1):D607-D613
https://doi.org/10.1093/nar/gky1131 PMID: 30476243 PMCID: PMC6323986

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

Free full text

Nucleic Acids Res. 2019 Jan 8; 47(Database issue): D607–D613.

Published online 2018 Nov 22. https://doi.org/10.1093/nar/gky1131

PMCID: PMC6323986

PMID: 30476243

STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

Damian Szklarczyk,¹ Annika L Gable,¹ David Lyon,¹ Alexander Junge,² Stefan Wyder,¹ Jaime Huerta-Cepas,³ Milan Simonovic,¹ Nadezhda T Doncheva,^2,⁴ John H Morris,⁵ Peer Bork,^6,^7,^8,⁹ Lars J Jensen,² and Christian von Mering¹

Damian Szklarczyk

¹Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland

Find articles by Damian Szklarczyk

Annika L Gable

¹Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland

Find articles by Annika L Gable

David Lyon

¹Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland

Find articles by David Lyon

Alexander Junge

²Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark

Find articles by Alexander Junge

Stefan Wyder

¹Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland

Find articles by Stefan Wyder

Jaime Huerta-Cepas

³Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM)—Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), 28223 Madrid, Spain

Find articles by Jaime Huerta-Cepas

Milan Simonovic

¹Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland

Find articles by Milan Simonovic

Nadezhda T Doncheva

²Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark

⁴Center for non-coding RNA in Technology and Health, University of Copenhagen, 2200 Copenhagen N, Denmark

Find articles by Nadezhda T Doncheva

John H Morris

⁵Resource on Biocomputing, Visualization, and Informatics, University of California, San Francisco, CA 94158-2517, USA

Find articles by John H Morris

Peer Bork

⁶Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany

⁷Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, 69117 Heidelberg, Germany

⁸Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany

⁹Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany

Find articles by Peer Bork

Lars J Jensen

²Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark

Find articles by Lars J Jensen

Christian von Mering

¹Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland

Find articles by Christian von Mering

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Abstract

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein–protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein–protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

Go to:

INTRODUCTION

While an impressive amount of structural and functional information on individual proteins has been amassed (1–3), our knowledge about their interactions remains more fragmented. Some interactions are quite well documented and understood, for example in the context of three-dimensional reconstructions of large cellular machineries (4–6), while others are only hinted at so far, through indirect evidence such as genetic observations or statistical predictions. Furthermore, the space of potential protein–protein interactions is much larger, and also more context-dependent, than the space of intrinsic molecular function of individual molecules. Interactions may not only be limited to certain cell types or certain physiological conditions, but their specificity and strength may vary as well, from obligatory, highly specific and stable bindings to more fleeting and relatively unspecific encounters. From a purely functional perspective, proteins can even interact specifically without touching at all, such as when a transcription factor helps to regulate the expression and production of another protein, or when two enzymes exchange a specific substrate via diffusion.

Arguably, the common denominator of the various forms of protein–protein associations is information flow—biologically meaningful interfaces have evolved to allow the flow of information through the cell, and they are ultimately essential for implementing a functional system. Hence, it is desirable to collect and integrate all types of protein–protein interactions under one framework; this then provides support for data analysis pipelines in diverse areas, ranging from disease module identification (7,8) to biomarker discovery (9–11) and allows manual browsing, ad hoc discovery and annotation.

Protein–protein interactions can be collected from a number of online databases (reviewed in (12,13)), as well as from individual high-throughput efforts, e.g. (14). Primary interaction databases (3,15–18) are jointly annotating experimental interaction evidence directly from the source publications, and they are coordinating their efforts through the IMEx consortium (19). They provide highly valuable added services such as curating metadata, maintaining common name spaces and devising ontologies and standards. A second source of protein–protein interaction information is provided by computational prediction efforts, some of which are hosted by dedicated databases, e.g. (20,21). Lastly, a third class of databases is dedicated to protein interactions at the widest scope, integrating both primary as well as predicted interactions, often including annotated pathway knowledge, text-mining results, inter-organism transfers or other accessory information. The STRING database (‘Search Tool for Retrieval of Interacting Genes/Proteins’) belongs to this latter class, along with GeneMania (22), FunCoup (23), I2D (24), ConsensusPathDb (25), IMP (26) and HumanNet (27)—most of which have recently been reviewed and benchmarked in (7).

STRING is one of the earliest efforts (28) and strives to differentiate itself mainly through (i) high coverage, (ii) ease of use and (iii) a consistent scoring system. It currently features the largest number of organisms (5090) and proteins (24.6 million), has very broad and diverse, benchmarked data sources and provides intuitive and fast viewers for online use. It also features a number of additional data access points, such as programmatic access through an API, access through a Cytoscape app (http://apps.cytoscape.org/apps/stringapp), as well as download pages covering individual species networks and associated data. The website allows users to log on and store their searches and gene sets, and contains evidence viewers to inspect the underlying evidence of any given interaction. It also provides users with high-level information regarding their input/search data, including network enrichment statistics and functional enrichment detection, using two different conceptual frameworks for the latter (see below). Many of the features of STRING have been made available and described earlier (28–31) and the website is currently accessed by around 3500 distinct users daily; its hosting facilities have recently been replicated and placed under a commercial load balancer, to provide added stability and capacity. Users can submit multiple proteins simultaneously and visualize large networks; the Cytoscape stringApp can even handle network sizes of several thousand proteins. STRING shares its genome-, protein- and name spaces with a number of sister projects, dedicated to orthology (eggNOG (32)), small molecules (STITCH (33)), protein abundances (PaxDB (34)), tissue expression (TISSUES (35)) and viruses (Viruses.STRING (36)), respectively.

Together with other online resources (including the IMEx consortium, which is one of STRING’s largest primary data sources), the STRING database has recently been awarded the status of a European Core Data Resource by ELIXIR, a pan-European bioinformatics initiative dedicated to sustainable bioinformatics infrastructure (37). As a prerequisite and consequence of this status, all interaction data and accessory information in STRING are now freely available without restrictions, under the Creative Commons Attribution (CC BY) 4.0 license.

Go to:

DATABASE CONTENT

The basic interaction unit in STRING is the ‘functional association’, i.e. a link between two proteins that both contribute jointly to a specific biological function (38–40). For two proteins to be associated this way, they do not need to interact physically. Instead, it is sufficient if at least some part of their functional roles in the cell overlap—and this overlapping function should be specific enough to broadly qualify as a pathway or functional map (in contrast, merely sharing ‘metabolism’ as an overlapping function would be too unspecific). By this definition, even proteins that antagonize each other can be functionally associated, such as an inhibitor and an activator within the same pathway. The desired specificity cutoff for functional associations in STRING roughly corresponds to the annotation granularity of KEGG pathway maps (41), whereby maps that largely group proteins by homology (such as ‘ABC transporters’) are removed from consideration.

All of the association evidence in the STRING database is categorized into one of seven independent ‘channels’: three prediction channels based on genomic context information (see below), and one channel each for (i) co-expression, (ii) text-mining, (iii) biochemical/genetic data (‘experiments’) and (iv) previously curated pathway and protein-complex knowledge (‘databases’). Users can disable all channels individually or in combinations. For each channel, separate interaction scores are available as well as viewers for inspecting the underlying evidence (Figure 1). In general, the interaction scores in STRING do not represent the strength or specificity of a given interaction, but instead are meant to express an approximate confidence, on a scale of zero to one, of the association being true, given all the available evidence. The scores in STRING are benchmarked using the subset of associations for which both protein partners are already functionally annotated; for this, the KEGG pathway maps (41) are used as a gold standard and they thus implicitly also determine the granularity of the functional associations.

An external file that holds a picture, illustration, etc.
Object name is gky1131fig1.jpg

Figure 1.

A typical association network in STRING. The yeast prion-like protein URE2 has been selected as input. The network has been expanded by an additional 10 proteins (via the ‘More’ button in the STRING interface), and the confidence cutoff for showing interaction links has been set to ‘highest’ (0.900). The insets at the right show how many items of the various evidence types in STRING contributed to this particular network (counts denote how many records covered at least two of the proteins in the network; not all of these records contributed high-scoring links after score calibration).

Within each channel, the evidence is further subdivided into two sub-scores, one of which represents evidence stemming from the organism itself, and the other represents evidence transferred from other organisms. For the latter transfer, the ‘interolog’ concept is applied (42,43); STRING uses hierarchically arranged orthologous group relations as defined in eggNOG (32), in order to transfer associations between organisms where applicable (described in (29)).

The individual protein associations in the various channels are derived, briefly, as follows:

The three genomic context prediction channels (neighborhood, fusion, gene co-occurrence) are the result of systematic all-against-all genome comparisons, aiming to assess the consequences of past genome rearrangements, gene gains and losses, as well as gene fusion events. These evolutionary events are known to be retained non-randomly with respect to the functional roles of genes, and thus allow the inference of functional associations between genes even for otherwise rarely studied organisms (genomic context techniques are reviewed in (44,45)).

The co-expression channel is based on gene-by-gene correlation tests across a large number of gene expression datasets (using both transcriptome measurements as well as proteome measurements). In the case of transcript data, STRING re-processes and maps the large number of experiments stored in the NCBI Gene Expression Omnibus (46), followed by normalization, redundancy reduction and Pearson correlation (described in (29)). For version 11, we have further improved the RNAseq co-expression inference pipeline. This was achieved by processing a higher number of RNAseq samples and using the robust biweight midcorrelation (47). In addition to NCBI Geo, for a subset of species, gene count data was downloaded from the ARCHS4 and ARCHS4 zoo collections (48).

Protein-based co-expression analysis is new in version 11 of STRING, and as of now it is restricted to one dataset imported as is: namely the ProteomeHD dataset of the Juri Rappsilber lab (unpublished, https://www.proteomehd.net/), covering 294 biological conditions measured using SILAC in human cells. ProteomeHD is not based on Pearson correlation, but instead uses the treeClust algorithm (49); for STRING, the results of this algorithm are recalibrated and scored using the KEGG benchmark. Each ProteomeHD-provided interaction features a cross-link through which the underlying evidence can be inspected at the ProteomeHD website.

For the experiments channel, all interaction records from the IMEx databases (plus BioGRID), are re-mapped and re-processed: first, duplicate records and datasets are removed, and then entire groups of records are benchmarked against KEGG and scored accordingly.

The database channel is based on manually curated interaction records assembled by expert curators, at KEGG (41), Reactome (50), BioCyc (51) and Gene Ontology (52), as well as legacy datasets from PID and BioCarta. STRING only retains associations between direct pathway members or within protein complexes. The database channel is the only channel for which score calibration does not apply; instead, all associations in this channel receive a high, uniform score (0.900).

At last, for the text-mining channel, STRING conducts statistical co-citation analysis across a large number of scientific texts, including all PubMed abstracts as well as OMIM (53). Since version 10.5 of STRING, the text corpus also contains a subset of full-text articles. For version 11.0, the Medline abstracts (last updated on 9 June 2018) were complemented with open access as well as author-manuscript full text articles available from PMC in BioC XML format (https://arxiv.org/abs/1804.05957) (last updated on 17 April 2018). Full-text articles that were not classified as English-language articles were removed (using fastText and a pretrained language identification model for 176 languages (https://arxiv.org/abs/1607.01759)), as were those that could not be mapped to PubMed. We also removed highly unspecific articles that mention more than 200 relevant biomedical entities such as proteins, chemicals, diseases or tissues. The final corpus consists of 28 579 637 scientific publications, of which 2 106 542 are available as full-text articles and the remainder as abstracts. While the text-mining pipeline itself has remained unchanged (last described in (29)), its dictionary of gene and protein names has been updated to the new set of genomes and the stop-word list improved to increase precision, especially for human proteins.

Go to:

NEW ENRICHMENT DETECTION MODE

For users that query the STRING database with a set of proteins (as opposed to a single query protein only), the website computes a functional enrichment analysis in the background; this can then be inspected and browsed by the user, and includes interactive projections of the results onto the user's protein network. This functionality has been available since version 9.1, and is based on straightforward over-representation analysis using hypergeometric tests.

However, this analysis uses only a small part of the information that the user might have about his or her protein list. First, the original list of proteins might have been much longer, and the user would have had to truncate it (thus far, STRING enforced an upper limit on the number of query items). Second, the list might have had a biologically meaningful ranking, which would have been lost during submission to STRING. Third, each protein might have been associated with some numerical information from the underlying experiment or study (such as a log fold change, a measured abundance, a phenotypic outcome, etc.). For this type of genome-wide measurements, simple overlap-based over-representation analysis is not the best choice (54–56).

Thus, beginning with version 11.0, STRING offers such users a second option for conducting enrichment analysis. It specifically asks for genome-scale input, with each protein or gene having an associated numerical value (a measurement or statistical metric). Of the available methods for searching functional enrichments in such a set, we chose a permutation-based, non-parametric test that performs well in a number of settings, termed ‘Aggregate Fold Change’ (56). Briefly, this test works by computing, for each gene set to be tested, the average of all values provided by the user for the constituent genes. This average is then compared against averages of randomized gene sets of the same size. Multiple testing correction is applied separately within each functional classification framework (GO, KEGG, InterPro, etc.), according to Benjamini and Hochberg (57), but not across these frameworks as there is significant overlap between them. For large gene sets, the AFC randomization method becomes prohibitively slow; these gene sets are instead tested after converting the user-provided gene values to ranks, using two-sided Kolmogorov–Smirnov testing. In addition to the usually applied functional classification frameworks, STRING uses two additional systems, thus giving users more options and potentially more novelty for discovery. The first is based on a hierarchical clustering of the STRING network itself. This assumes that tightly connected modules within the network broadly correspond to functional units, and has the advantage that it covers a broader scope and potentially also novel modules that may not yet be annotated as pathways. The clustering is based on a confidence diffusion state distance matrix (58,59) computed on the full, organism-wide STRING network, which is clustered hierarchically using HPC-CLUST with average linkage (60). To compute the DSD matrix, the final, combined STRING-score between proteins is used, and the DSD algorithm is run with default parameters and the ‘-c’ flag (confidence). Following the clustering procedure, all clusters with sizes between 5 and 200 are included in the functional enrichment testing, and reported under their own, separate classification category. The second additional set for enrichment testing consists of all published papers mapping to the genes in the user’s input. This takes advantage of STRING’s text-mining channel, for which all of PubMed’s abstract and some additional scientific text are already mapped onto STRING’s protein space (based on identifier matches in the text). Detecting publications that are enriched in the user-input ranking provides yet another complementary way of interpreting the input, often with a more fine-grained view.

Following the computation of the entire new enrichment option, users are presented with a three-panel view of the results (Figure (Figure2).2). There, each enriched functional subset can be highlighted, and tracked back to the user's input as well as to a pre-rendered, organism-wide STRING network. The layout of the latter is based on a t-SNE-visualization of the network (61) and can be zoomed and panned interactively.

An external file that holds a picture, illustration, etc.
Object name is gky1131fig2.jpg

Figure 2.

Functional enrichment analysis of a genome-sized input set. An expression dataset comparing metastatic melanoma cells with normal skin tissue (62) has been submitted to STRING, with average log fold change values associated to each gene (negative values signify depletion in the melanoma cells). The screenshot shows how STRING presents and groups statistical enrichment observations for a number of pathways and functional subsystems. When hovering with the mouse, the website highlights the corresponding proteins both in the input data on the left side, as well as in the organism-wide network on the right side. The latter can be interactively zoomed until individual proteins and their neighbors become discernible. Here, the highlighted observation shows that the desmosome is downregulated in melanoma cells—this stands out by way of several publications in PubMed whose discussed proteins (desmosome proteins) are strongly enriched at one end of the user input.

Go to:

OUTLOOK

Over the coming years, the STRING team aims to continue tracking all available protein association evidence types and prediction algorithms. One particular focus will be to expand the protein-based co-expression channel, where advances in proteomics throughput and scope lead us to expect growing data support for association searches. With regard to the STRING website, we expect to provide tighter integration of functional enrichment and network search results, and are exploring options to provide more context on the various networks (such as cell type, tissues, organelles). We will also strive to provide better interoperability options and increase our list of partnered, crosslinked resources as well as applicable direct data import options to facilitate our regular data updates.

Go to:

ACKNOWLEDGEMENTS

We are indebted to Juri Rappsilber and his team for sharing ProteomeHD data prior to publication, and to Yan P. Yuan for excellent IT support at EMBL. Thomas Rattei and his SIMAP project at University of Vienna provided essential protein similarity data for our very large sequence space. We thank Tudor Oprea and the Illuminating the Druggable Genome project for help in improving the text mining, and Daniel Mende and Sofia Forslund for their help in selecting a non-redundant set of high-quality genomes.

Go to:

FUNDING

The Swiss Institute of Bioinformatics (Lausanne) provides long-term core funding for STRING, as do the Novo Nordisk Foundation (Copenhagen, NNF14CC0001) and the European Molecular Biology Laboratory (EMBL Heidelberg). N.D.T. received funding from the Danish Council for Independent Research (DFF-4005-00443), and A.J. from the National Institutes of Health (NIH) Illuminating the Druggable Genome Knowledge Management Center (U54 CA189205 and U24 224370). J.H.M. was funded by the NIH (NIGMS P41 GM103504), by grant number 2018-183120 from the Chan Zuckerberg Initiative DAF, and by the advised fund of the Silicon Valley Community Foundation. Incorporation into the German bioinformatics infrastructure has been enabled by the BMBF (de.nbi grant #031A537B). Funding for Open Access charges: University of Zurich.

Conflict of interest statement. None declared.

Go to:

REFERENCES

1. Xie L., Bourne P.E. Functional coverage of the human genome by existing structures, structural genomics targets, and homology models. PLoS Comput. Biol. 2005; 1:e31. [Europe PMC free article] [Abstract] [Google Scholar]

2. Uhlen M., Oksvold P., Fagerberg L., Lundberg E., Jonasson K., Forsberg M., Zwahlen M., Kampf C., Wester K., Hober S. et al. Towards a Knowledge-Based human protein atlas. Nat. Biotechnol. 2010; 28:1248–1250. [Abstract] [Google Scholar]

3. UniProt Consortium, T UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018; 45:D158–D169. [Europe PMC free article] [Abstract] [Google Scholar]

4. Ban N., Nissen P., Hansen J., Moore P.B., Steitz T.A. The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science. 2000; 289:905–920. [Abstract] [Google Scholar]

5. Schuller J.M., Falk S., Fromm L., Hurt E., Conti E. Structure of the nuclear exosome captured on a maturing preribosome. Science. 2018; 360:219–222. [Abstract] [Google Scholar]

6. Marsh J.A., Teichmann S.A. Structure, dynamics, assembly, and evolution of protein complexes. Annu. Rev. Biochem. 2015; 84:551–575. [Abstract] [Google Scholar]

7. Huang J.K., Carlin D.E., Yu M.K., Zhang W., Kreisberg J.F., Tamayo P., Ideker T. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 2018; 6:484–495. [Europe PMC free article] [Abstract] [Google Scholar]

8. Khurana V., Peng J., Chung C.Y., Auluck P.K., Fanning S., Tardiff D.F., Bartels T., Koeva M., Eichhorn S.W., Benyamini H. et al. Genome-scale networks link neurodegenerative disease genes to alpha-Synuclein through specific molecular pathways. Cell Syst. 2017; 4:157–170. [Europe PMC free article] [Abstract] [Google Scholar]

9. Hayashida M., Akutsu T. Complex network-based approaches to biomarker discovery. Biomark. Med. 2016; 10:621–632. [Abstract] [Google Scholar]

10. Chuang H.Y., Lee E., Liu Y.T., Lee D., Ideker T. Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 2007; 3:140. [Europe PMC free article] [Abstract] [Google Scholar]

11. Liu X., Chang X., Liu R., Yu X., Chen L., Aihara K. Quantifying critical states of complex diseases using single-sample dynamic network biomarkers. PLoS Comput. Biol. 2017; 13:e1005633. [Europe PMC free article] [Abstract] [Google Scholar]

12. Gemovic B., Sumonja N., Davidovic R., Perovic V., Veljkovic N. Mapping of Protein-Protein interactions: Web-Based resources for revealing interactomes. Curr. Med. Chem. 2018; 10.2174/0929867325666180214113704. [Abstract] [Google Scholar]

13. Sowmya G., Ranganathan S. Protein-protein interactions and prediction: a comprehensive overview. Protein Pept. Lett. 2014; 21:779–789. [Abstract] [Google Scholar]

14. Drew K., Lee C., Huizar R.L., Tu F., Borgeson B., McWhite C.D., Ma Y., Wallingford J.B., Marcotte E.M. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol. Syst. Biol. 2017; 13:932. [Europe PMC free article] [Abstract] [Google Scholar]

15. Salwinski L., Miller C.S., Smith A.J., Pettit F.K., Bowie J.U., Eisenberg D. The database of interacting proteins: 2004 update. Nucleic Acids Res. 2004; 32:D449–D451. [Europe PMC free article] [Abstract] [Google Scholar]

16. Orchard S., Ammari M., Aranda B., Breuza L., Briganti L., Broackes-Carter F., Campbell N.H., Chavali G., Chen C., del-Toro N. et al. The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014; 42:D358–D363. [Europe PMC free article] [Abstract] [Google Scholar]

17. Chatr-Aryamontri A., Oughtred R., Boucher L., Rust J., Chang C., Kolas N.K., O’Donnell L., Oster S., Theesfeld C., Sellam A. et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017; 45:D369–D379. [Europe PMC free article] [Abstract] [Google Scholar]

18. Ammari M.G., Gresham C.R., McCarthy F.M., Nanduri B. HPIDB 2.0: a curated database for host-pathogen interactions. Database (Oxford). 2016; 2016:baw103. [Europe PMC free article] [Abstract] [Google Scholar]

19. Orchard S., Kerrien S., Abbani S., Aranda B., Bhate J., Bidwell S., Bridge A., Briganti L., Brinkman F.S., Cesareni G. et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods. 2012; 9:345–350. [Europe PMC free article] [Abstract] [Google Scholar]

20. Zhang Q.C., Petrey D., Garzon J.I., Deng L., Honig B. PrePPI: a structure-informed database of protein-protein interactions. Nucleic Acids Res. 2013; 41:D828–D833. [Europe PMC free article] [Abstract] [Google Scholar]

21. McDowall M.D., Scott M.S., Barton G.J. PIPs: human protein-protein interaction prediction database. Nucleic Acids Res. 2009; 37:D651–D656. [Europe PMC free article] [Abstract] [Google Scholar]

22. Franz M., Rodriguez H., Lopes C., Zuberi K., Montojo J., Bader G.D., Morris Q. GeneMANIA update 2018. Nucleic Acids Res. 2018; 46:W60–W64. [Europe PMC free article] [Abstract] [Google Scholar]

23. Ogris C., Guala D., Sonnhammer E.L.L. FunCoup 4: new species, data, and visualization. Nucleic Acids Res. 2018; 46:D601–D607. [Europe PMC free article] [Abstract] [Google Scholar]

24. Kotlyar M., Pastrello C., Sheahan N., Jurisica I. Integrated interactions database: tissue-specific view of the human and model organism interactomes. Nucleic Acids Res. 2016; 44:D536–D541. [Europe PMC free article] [Abstract] [Google Scholar]

25. Herwig R., Hardt C., Lienhard M., Kamburov A. Analyzing and interpreting genome data at the network level with ConsensusPathDB. Nat. Protoc. 2016; 11:1889–1907. [Abstract] [Google Scholar]

26. Wong A.K., Krishnan A., Yao V., Tadych A., Troyanskaya O.G. IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res. 2015; 43:W128–D133. [Europe PMC free article] [Abstract] [Google Scholar]

27. Lee I., Blom U.M., Wang P.I., Shim J.E., Marcotte E.M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011; 21:1109–1121. [Europe PMC free article] [Abstract] [Google Scholar]

28. Snel B., Lehmann G., Bork P., Huynen M.A. STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 2000; 28:3442–3444. [Europe PMC free article] [Abstract] [Google Scholar]

29. Franceschini A., Szklarczyk D., Frankild S., Kuhn M., Simonovic M., Roth A., Lin J., Minguez P., Bork P., von Mering C. et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013; 41:D808–D815. [Europe PMC free article] [Abstract] [Google Scholar]

30. Szklarczyk D., Franceschini A., Wyder S., Forslund K., Heller D., Huerta-Cepas J., Simonovic M., Roth A., Santos A., Tsafou K.P. et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015; 43:D447–D452. [Europe PMC free article] [Abstract] [Google Scholar]

31. Szklarczyk D., Morris J.H., Cook H., Kuhn M., Wyder S., Simonovic M., Santos A., Doncheva N.T., Roth A., Bork P. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017; 45:D362–D368. [Europe PMC free article] [Abstract] [Google Scholar]

32. Huerta-Cepas J., Szklarczyk D., Forslund K., Cook H., Heller D., Walter M.C., Rattei T., Mende D.R., Sunagawa S., Kuhn M. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016; 44:D286–D293. [Europe PMC free article] [Abstract] [Google Scholar]

33. Szklarczyk D., Santos A., von Mering C., Jensen L.J., Bork P., Kuhn M. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016; 44:D380–D384. [Europe PMC free article] [Abstract] [Google Scholar]

34. Wang M., Herrmann C.J., Simonovic M., Szklarczyk D., von Mering C. Version 4.0 of PaxDb: Protein abundance data, integrated across model organisms, tissues, and cell-lines. Proteomics. 2015; 15:3163–3168. [Europe PMC free article] [Abstract] [Google Scholar]

35. Palasca O., Santos A., Stolte C., Gorodkin J., Jensen L.J. TISSUES 2.0: an integrative web resource on mammalian tissue expression. Database (Oxford). 2018; 2018:bay003. [Europe PMC free article] [Abstract] [Google Scholar]

36. Cook V.H., Doncheva N.T., Szklarczyk D., von Mering C., Jensen L.J. Viruses.STRING: A virus–host protein–protein interaction database. Viruses. 2018; 10:519. [Europe PMC free article] [Abstract] [Google Scholar]

37. Durinx C., McEntyre J., Appel R., Apweiler R., Barlow M., Blomberg N., Cook C., Gasteiger E., Kim J.H., Lopez R. et al. Identifying ELIXIR core data resources [version 2; referees: 2 approved]. F1000Res. 2016; 5:2422. [Europe PMC free article] [Abstract] [Google Scholar]

38. Enright A.J., Ouzounis C.A. Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol. 2001; 2:RESEARCH0034. [Europe PMC free article] [Abstract] [Google Scholar]

39. Snel B., Bork P., Huynen M.A. The identification of functional modules from the genomic association of genes. PNAS. 2002; 99:5890–5895. [Europe PMC free article] [Abstract] [Google Scholar]

40. Studham M.E., Tjarnberg A., Nordling T.E., Nelander S., Sonnhammer E.L. Functional association networks as priors for gene regulatory network inference. Bioinformatics. 2014; 30:i130–i138. [Europe PMC free article] [Abstract] [Google Scholar]

41. Kanehisa M., Furumichi M., Tanabe M., Sato Y., Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017; 45:D353–D361. [Europe PMC free article] [Abstract] [Google Scholar]

42. Walhout A.J., Sordella R., Lu X., Hartley J.L., Temple G.F., Brasch M.A., Thierry-Mieg N., Vidal M. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science. 2000; 287:116–122. [Abstract] [Google Scholar]

43. Yu H., Luscombe N.M., Lu H.X., Zhu X., Xia Y., Han J.D., Bertin N., Chung S., Vidal M., Gerstein M. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res. 2004; 14:1107–1118. [Europe PMC free article] [Abstract] [Google Scholar]

44. Huynen M., Snel B., Lathe W. 3rd, Bork P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 2000; 10:1204–1210. [Europe PMC free article] [Abstract] [Google Scholar]

45. Skrabanek L., Saini H.K., Bader G.D., Enright A.J. Computational prediction of protein-protein interactions. Mol. Biotechnol. 2008; 38:1–17. [Abstract] [Google Scholar]

46. Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., Holko M. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013; 41:D991–D995. [Europe PMC free article] [Abstract] [Google Scholar]

47. Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008; 9:559. [Europe PMC free article] [Abstract] [Google Scholar]

48. Lachmann A., Torre D., Keenan A.B., Jagodnik K.M., Lee H.J., Wang L., Silverstein M.C., Ma’ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nat. Commun. 2018; 9:1366. [Europe PMC free article] [Abstract] [Google Scholar]

49. Buttrey S.L., Whitaker L.R. treeClust: an R package for Tree-Based clustering dissimilarities. TheR Journal. 2015; 7:227–236. [Google Scholar]

50. Fabregat A., Sidiropoulos K., Garapati P., Gillespie M., Hausmann K., Haw R., Jassal B., Jupe S., Korninger F., McKay S. et al. The reactome pathway Knowledgebase. Nucleic Acids Res. 2016; 44:D481–D487. [Europe PMC free article] [Abstract] [Google Scholar]

51. Caspi R., Billington R., Ferrer L., Foerster H., Fulcher C.A., Keseler I.M., Kothari A., Krummenacker M., Latendresse M., Mueller L.A. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 2016; 44:D471–D480. [Europe PMC free article] [Abstract] [Google Scholar]

52. The Gene Ontology, C Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 2017; 45:D331–D338. [Europe PMC free article] [Abstract] [Google Scholar]

53. Amberger J.S., Bocchini C.A., Schiettecatte F., Scott A.F., Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM(R)), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015; 43:D789–D798. [Europe PMC free article] [Abstract] [Google Scholar]

54. Garcia-Campos M.A., Espinal-Enriquez J., Hernandez-Lemus E. Pathway analysis: state of the art. Front. Physiol. 2015; 6:383. [Europe PMC free article] [Abstract] [Google Scholar]

55. Tarca A.L., Bhatti G., Romero R. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS One. 2013; 8:e79217. [Europe PMC free article] [Abstract] [Google Scholar]

56. Yu C., Woo H.J., Yu X., Oyama T., Wallqvist A., Reifman J. A strategy for evaluating pathway analysis methods. BMC Bioinformatics. 2017; 18:453. [Europe PMC free article] [Abstract] [Google Scholar]

57. Benyamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B. 1995; 57:289–300. [Google Scholar]

58. Cao M., Zhang H., Park J., Daniels N.M., Crovella M.E., Cowen L.J., Hescott B. Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS One. 2013; 8:e76339. [Europe PMC free article] [Abstract] [Google Scholar]

59. Cao M., Pietras C.M., Feng X., Doroschak K.J., Schaffner T., Park J., Zhang H., Cowen L.J., Hescott B.J. New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics. 2014; 30:i219–i227. [Europe PMC free article] [Abstract] [Google Scholar]

60. Matias Rodrigues J.F., von Mering C. HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences. Bioinformatics. 2014; 30:287–288. [Europe PMC free article] [Abstract] [Google Scholar]

61. van der Maaten L.J.P., Hinton G.E. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008; 9:2579–2605. [Google Scholar]

62. Riker A.I., Enkemann S.A., Fodstad O., Liu S., Ren S., Morris C., Xi Y., Howell P., Metge B., Samant R.S. et al. The gene expression profiles of primary and metastatic melanoma yields a transition point of tumor progression and metastasis. BMC Med. Genomics. 2008; 1:13. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Full text links

Read article at publisher's site: https://doi.org/10.1093/nar/gky1131

Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/nar/article-pdf/47/D1/D607/27437323/gky1131.pdf

Citations & impact

Impact metrics

8,244

Citations

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/51744224

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/51744224

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1093/nar/gky1131

Supporting

Mentioning

Contrasting

11567

Article citations

Unraveling neuroprotection with Kv1.3 potassium channel blockade by a scorpion venom peptide.
Beraldo-Neto E, Ferreira VF, Vigerelli H, Fernandes KR, Juliano MA, Nencioni ALA, Pimenta DC
Sci Rep, 14(1):27888, 13 Nov 2024
Cited by: 0 articles | PMID: 39537765 | PMCID: PMC11561340
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Brain-wide alterations revealed by spatial transcriptomics and proteomics in COVID-19 infection.
Zhang T, Li Y, Pan L, Sha J, Bailey M, Faure-Kumar E, Williams CK, Wohlschlegel J, Magaki S, Niu C, Lee Y, Su YC, Li X, Vinters HV, Geschwind DH
Nat Aging, 4(11):1598-1618, 14 Nov 2024
Cited by: 0 articles | PMID: 39543407
Proteomic analysis of the combined effects of cannabigerol and 3-O-ethyl ascorbic acid on kinase-dependent signalling in UVB-irradiated human keratinocytes.
Gęgotek A, Jarocka-Karpowicz I, Ryšavá A, Žarković N, Skrzydlewska E
Sci Rep, 14(1):27799, 13 Nov 2024
Cited by: 0 articles | PMID: 39537961 | PMCID: PMC11561052
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Elucidating the gastroprotective mechanisms of Imperata cylindrica Beauv.var. major (Nees) C.E.Hubb through UHPLC-MS/MS and systems network pharmacology.
Zhou J, Hu J, Liu J, Zhang W
Sci Rep, 14(1):27815, 13 Nov 2024
Cited by: 0 articles | PMID: 39537788 | PMCID: PMC11560922
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Force-triggered density gradient sedimentation and cocktail enzyme digestion treatment for isolation of single dermal papilla cells from follicular unit extraction harvesting human hair follicles.
Huang J, Chen J, Li H, Fan Z, Gan Y, Chen Y, Du L
Stem Cell Res Ther, 15(1):416, 13 Nov 2024
Cited by: 0 articles | PMID: 39533379 | PMCID: PMC11559101
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (8,244) article citations

Other citations

Wikipedia

https://en.wikipedia.org/wiki/STRING

Funding

Funders who supported this work.

Bundesministerium für Bildung und Forschung (1)

Grant ID: #031A537B
2 publications

Danish Council for Independent Research (1)

Grant ID: DFF-4005-00443
1 publication

NIGMS NIH HHS (1)

Grant ID: P41 GM103504
285 publications

National Institute of General Medical Sciences (1)

Grant ID: P41 GM103504
13 publications

National Institutes of Health (2)

Grant ID: U54 CA189205
2 publications
Grant ID: U24 224370
1 publication

Novo Nordisk Foundation Center for Protein Research (1)

Grant ID: PI Lars Juhl Jensen
137 publications

Swiss Institute of Bioinformatics (1)

Grant ID: NNF14CC0001
1 publication

Search life-sciences literature (45,100,050 articles, preprints and more)

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Author information

Affiliations

Authors

Authors

Authors

Authors

Authors

ORCIDs linked to this article

Abstract

Free full text

STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

Damian Szklarczyk

Annika L Gable

David Lyon

Alexander Junge

Stefan Wyder

Jaime Huerta-Cepas

Milan Simonovic

Nadezhda T Doncheva

John H Morris

Peer Bork

Lars J Jensen

Christian von Mering

Abstract

INTRODUCTION

DATABASE CONTENT

NEW ENRICHMENT DETECTION MODE

OUTLOOK

ACKNOWLEDGEMENTS

FUNDING

REFERENCES

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Other citations

Wikipedia

Similar Articles

Funding

Bundesministerium für Bildung und Forschung (1)﻿

Danish Council for Independent Research (1)﻿

NIGMS NIH HHS (1)﻿

National Institute of General Medical Sciences (1)﻿

National Institutes of Health (2)﻿

Novo Nordisk Foundation Center for Protein Research (1)﻿

Swiss Institute of Bioinformatics (1)﻿

Partnerships & funding

Bundesministerium für Bildung und Forschung (1)

Danish Council for Independent Research (1)

NIGMS NIH HHS (1)

National Institute of General Medical Sciences (1)

National Institutes of Health (2)

Novo Nordisk Foundation Center for Protein Research (1)

Swiss Institute of Bioinformatics (1)