Abstract
Free full text
Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2
To the Editor — Rapid advances in DNA-sequencing and bioinformatics technologies in the past two decades have substantially improved understanding of the microbial world. This growing understanding relates to the vast diversity of microorganisms; how microbiota and microbiomes affect disease1 and medical treatment2; how microorganisms affect the health of the planet3; and the nascent exploration of the medical4, forensic5, environmental6 and agricultural7 applications of microbiome biotechnology. Much of this work has been driven by marker-gene surveys (for example, bacterial/archaeal 16S rRNA genes, fungal internal-transcribed-spacer regions and eukaryotic 18S rRNA genes), which profile microbiota with varying degrees of taxonomic specificity and phylogenetic information. The field is now transitioning to integrate other data types, such as metabolite8, metaproteome9 or metatranscriptome9,10 profiles.
The QIIME 1 microbiome bioinformatics platform has supported many microbiome studies and gained a broad user and developer community. Interactions with QIIME 1 users in our online support forum, our workshops and direct collaborations have shown the platform’s potential to serve an increasingly diverse array of microbiome researchers in academia, government and industry. Here, we present QIIME 2, a completely reengineered and rewritten system that is expected to facilitate reproducible and modular analysis of microbiome data to enable the next generation of microbiome science.
QIIME 2 was developed on the basis of a plugin architecture (Supplementary Fig. 1) that allows third parties to contribute functionality (https://library.qiime2.org). QIIME 2 plugins exist for latest-generation tools for sequence quality control from different sequencing platforms (DADA2 (ref.11) and Deblur12), taxonomy assignment13 and phylogenetic insertion14, which quantitatively improve the results over QIIME 1 and other tools (as detailed in the corresponding tool-specific publications). The plugins also support qualitatively new functionality, including microbiome paired-sample and time-series analysis15 (which are critical for studying the effects of treatments on the microbiome), and machine learning16. Trained machine learning models can be saved for application to new data and interrogated to identify important microbiome features. Several recently released plugins, including q2-cscs17, q2-metabolomics18, q2-shogun19, q2-metaphlan2 (ref.20) and q2-picrust2 (ref.21), provide initial support for analysis of metabolomics and shotgun metagenomics data. We are currently working with teams developing bioinformatics tools for metatranscriptomics and metaproteomics, and we expect to add new plugins supporting these data types to the ecosystem shortly. Additionally, many of the existing ‘downstream’ analysis tools, such as q2-sample-classifier16, can already work with these data types individually or in combination if they are provided in a feature table. Thus, QIIME 2 has the potential to serve not only as a marker-gene analysis tool but also a multidimensional and powerful data science platform that can be rapidly adapted to analyze diverse microbiome features.
QIIME 2 provides many new interactive visualization tools facilitating exploratory analyses and result reporting. Static versions of interactive visualizations resulting from four worked examples are provided in Fig. 1. QIIME 2 View (https://view.qiime2.org) is a unique new service (Supplementary Methods) that allows users to securely share and interact with results without installing QIIME 2. The QIIME 2 visualizations presented in Fig. 1 are provided in Supplementary File 1 to allow readers to interact with QIIME 2 View. Corresponding worked QIIME 2 example code is provided in the Supplementary Methods.
Reproducibility, transparency and clarity of microbiome data science are guiding principles in QIIME 2 design. To this end, QIIME 2 includes a decentralized data-provenance tracking system: details of all analysis steps with references to intermediate data are automatically stored in the results. Users can thus retrospectively determine exactly how any result was generated (Fig. 2 illustrates a simplified provenance graph derived from the data provenance of Fig. 1b). QIIME 2 also detects corrupted results indicating that the provenance is no longer reliable and the results no longer contain information enabling reproducibility. The provenance of the visualizations presented in Fig. 1 can be interactively reviewed by loading the contents of Supplementary File 1 with QIIME 2 View, providing far more detailed information than can typically be provided in Methods text. QIIME 2 results are also semantically typed (Fig. 2), and actions indicate acceptable input types, clarifying the data that actions should be applied to and making complex workflows less error prone. Complex workflows can be created and shared by using Jupyter Notebooks22 or Common Workflow Language (CWL)23, and support for other workflow engines is currently in development.
Finally, QIIME 2 provides a software-development kit (https://dev.qiime2.org) that can be used to integrate it as a component of other systems (such as Qiita24 or Illumina BaseSpace) and to develop interfaces targeted toward users with different levels of computational sophistication (Supplementary Fig. 2). QIIME 2 provides the QIIME 2 Studio graphical user interface and QIIME 2 View, interfaces designed for end-user biologists, clinicians and policy-makers; the QIIME 2 application programming interface, designed for data scientists who want to automate workflows or work interactively in Jupyter Notebooks22; and q2cli and q2cwl, providing a command-line interface and CWL23 wrappers for QIIME 2, designed for experts in high-performance computing. At present, computationally expensive steps support parallel computing at the individual-action level (for example, many actions including de-noising and taxonomy assignment support multiple threads). We are currently developing deeper integration with parallelism strategies available in third-party workflow engines, and workflow-level parallelism is currently possible through CWL.
There are many other powerful open-source software tools for microbiome data science, including mothur25, phyloseq26 and related tools available through Bioconductor27, and the biobakery suite20,21,28. The microbiome bioinformatics platform mothur is often compared to QIIME 1 and QIIME 2. A major difference between mothur and QIIME lies in the interactive visualizations: QIIME 2 provides many interactive visualization tools (several examples are provided in Fig. 1), whereas mothur focuses on generating data that can be easily loaded and visualized with other tools. The phyloseq tool focuses on microbiome statistical analysis and generating publication-ready visualizations but, unlike QIIME 2, begins with a feature or operational-taxonomic-unit table, leaving ‘upstream’ processing steps, such as sequence demultiplexing and quality control, to other processing pipelines, many of which (like phyloseq) are available through Bioconductor. The biobakery suite provides analytic functionality that complements that of QIIME 2, and we are actively working with biobakery developers to support interoperability by making their tools accessible as QIIME 2 plugins (for example, the q2-metaphlan2 plugin allows users to run MetaPhlAn2 through QIIME 2). QIIME 2 provides the only Python-based microbiome data-science platform that supports retrospective data-provenance tracking to ensure reproducibility, multi-omics analysis support, interfaces geared toward different user types to enhance usability and an extensibility-focused design through the plugin architecture and software-development kit. We share feedback from users of QIIME 2 on these and other features in Supplementary Methods.
The tools described in the preceding paragraph are all interoperable through plugins, exchange of files in standard formats or using multi-language environments, such as Jupyter Notebooks22. For example, the BIOM format29 is supported by all of them. A diverse ecosystem of interoperable software is beneficial for the field, because it allows both experienced users to obtain multiple perspectives on their data and novice bioinformaticians to work in the programming environments that they are most comfortable with (for example, phyloseq allows users to work in R, whereas QIIME 2 allows users to work in Python). We plan to continue working with the developers of these tools, and with organizations such as the Genomics Standards Consortium, on plugins and standards to ensure interoperability, as well as developing tools to automatically import data from microbiome data-sharing platforms such as Qiita, the European Bioinformatics Institute (EBI) European Read Archive and the National Center for Biotechnology Information (NCBI) Sequence Read Archive.
Advances in microbiome research promise to improve many aspects of health and the world, and QIIME 2 will help drive those advances by enabling accessible, community-driven microbiome data science.
Data availability
Data for the analyses presented in Fig. 1 are available as follows: Earth Microbiome Project data in Fig. 1a were obtained from ftp://ftp.microbio.me/emp/release1, and the American Gut Project (AGP) data were obtained from Qiita (http://qiita.microbio.me) study ID 10317. Sequence data in Fig. 1c are available in Qiita under study ID 10249 and the EBI under accession number ERP016173. Sequence data in Fig. 1b are available in Qiita under study ID 925 and the EBI under accession number ERP022167. Data in Fig. 1d are available in the q2-ili GitHub repository (https://github.com/biocore/q2-ili). Interactive versions of the Fig. 1 visualizations can be accessed at https://github.com/qiime2/paper1.
Code availability
QIIME 2 is open source and free for all use, including commercial. It is licensed under a BSD three-clause license. Source code is available at https://github.com/qiime2. Help for QIIME 2 is provided at https://forum.qiime2.org.
Supplementary Material
Supplementary File 1
Supplementary Information
Acknowledgements
QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award. Thanks to the Yellowstone Center for Resources for research permit no. 5664 to J.R.S. for Yellowstone access and sample collection. We thank P. J. McMurdie for helpful discussion on the relationships between QIIME 2 and phyloseq. We would like to thank the users of QIIME 1 and 2, whose invaluable feedback has shaped QIIME 2. In particular, we would like to thank A. Abdelfattah (Stockholm University, Sweden), R. C. T. Boutin (University of British Columbia, Canada), D. J. Bradshaw II (Florida Atlantic University Harbor Branch Oceanographic Institute, USA), L. Bullington (MPG Ranch, USA), J. W. Debelius (Karolinska Institutet, Sweden), C. Duvallet (Massachusetts Institute of Technology, USA), E. Korzune Ganda (Cornell University, USA), A. Mahnert (Medical University of Graz, Austria), M. C. Melendrez (St. Cloud State University, USA), D. O’Rourke (University of New Hampshire, USA), A. R. Rivers (USDA ARS, USA), B. Sen (Tianjin University, China), S. Tangedal (Haukeland University Hospital and University of Bergen, Norway), P. J. Torres (San Diego State University, USA) and J. Warren (National Laboratory Service, UK) for writing end-user reviews included in the Supplementary Methods.
Footnotes
Supplementary information is available for this paper at https://doi.org/10.1038/s41587-019-0209-9.
References
Full text links
Read article at publisher's site: https://doi.org/10.1038/s41587-019-0209-9
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc7015180?pdf=render
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1038/s41587-019-0209-9
Article citations
A Step-by-Step Guide to Sequencing and Assembly of Complete Bacterial Genomes Using the Oxford Nanopore MinION.
Methods Mol Biol, 2866:31-43, 01 Jan 2025
Cited by: 0 articles | PMID: 39546195
Distinct microbial communities associated with health-relevant wild berries.
Environ Microbiol Rep, 16(6):e70048, 01 Dec 2024
Cited by: 1 article | PMID: 39540551 | PMCID: PMC11561701
Microbial diversity in the arid and semi-arid soils of Botswana.
Environ Microbiol Rep, 16(6):e70044, 01 Dec 2024
Cited by: 0 articles | PMID: 39535358 | PMCID: PMC11558117
Subtle changes in topsoil microbial communities of drained forested peatlands after prolonged drought.
Environ Microbiol Rep, 16(6):e70041, 01 Dec 2024
Cited by: 0 articles | PMID: 39512007 | PMCID: PMC11544035
Freeze-dried fecal microorganisms as an effective biomaterial for the treatment of calves suffering from diarrhea.
Sci Rep, 14(1):28078, 14 Nov 2024
Cited by: 0 articles | PMID: 39543390 | PMCID: PMC11564888
Go to all (7,241) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Data Citations
- (1 citation) DOI - 10.6084/m9.figshare.3115156.v2
Nucleotide Sequences (2)
- (1 citation) ENA - ERP022167
- (1 citation) ENA - ERP016173
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
QIIME 2 Enables Comprehensive End-to-End Analysis of Diverse Microbiome Data and Comparative Studies with Publicly Available Data.
Curr Protoc Bioinformatics, 70(1):e100, 01 Jun 2020
Cited by: 158 articles | PMID: 32343490 | PMCID: PMC9285460
Author Correction: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.
Nat Biotechnol, 37(9):1091, 01 Sep 2019
Cited by: 140 articles | PMID: 31399723
Advancing our understanding of the human microbiome using QIIME.
Methods Enzymol, 531:371-444, 01 Jan 2013
Cited by: 321 articles | PMID: 24060131 | PMCID: PMC4517945
Microbiome data science.
J Biosci, 44(5):115, 01 Oct 2019
Cited by: 27 articles | PMID: 31719224
Review
Funding
Funders who supported this work.
Intramural NIH HHS (1)
Grant ID: Z99 CA999999
NCI NIH HHS (1)
Grant ID: U54 CA143925
NIEHS NIH HHS (1)
Grant ID: T32 ES015459
NIGMS NIH HHS (1)
Grant ID: R35 GM133420
NIMHD NIH HHS (1)
Grant ID: U54 MD012388
NNF Center for Basic Metabolic Research (1)
Grant ID: Arumugam Group