Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms.

Caporaso JG; Lauber CL; Walters WA; Berg-Lyons D; Huntley J; Fierer N; Owens SM; Betley J; Fraser L; Bauer M; Gormley N; Gilbert JA; Smith G; Knight R

doi:10.1038/ismej.2012.8

Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms.

Affiliations

1. Department of Computer Science, Northern Arizona University, Flagstaff, AZ, USA.
Authors
Caporaso JG¹
(1 author)

ORCIDs linked to this article

The ISME Journal, 08 Mar 2012, 6(8):1621-1624
https://doi.org/10.1038/ismej.2012.8 PMID: 22402401 PMCID: PMC3400413

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

DNA sequencing continues to decrease in cost with the Illumina HiSeq2000 generating up to 600 Gb of paired-end 100 base reads in a ten-day run. Here we present a protocol for community amplicon sequencing on the HiSeq2000 and MiSeq Illumina platforms, and apply that protocol to sequence 24 microbial communities from host-associated and free-living environments. A critical question as more sequencing platforms become available is whether biological conclusions derived on one platform are consistent with what would be derived on a different platform. We show that the protocol developed for these instruments successfully recaptures known biological results, and additionally that biological conclusions are consistent across sequencing platforms (the HiSeq2000 versus the MiSeq) and across the sequenced regions of amplicons.

Free full text

ISME J. 2012 Aug; 6(8): 1621–1624.

Published online 2012 Mar 8. https://doi.org/10.1038/ismej.2012.8

PMCID: PMC3400413

PMID: 22402401

Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms

J Gregory Caporaso,¹ Christian L Lauber,² William A Walters,³ Donna Berg-Lyons,² James Huntley,⁴ Noah Fierer,^2,⁵ Sarah M Owens,⁶ Jason Betley,⁷ Louise Fraser,⁷ Markus Bauer,⁷ Niall Gormley,⁷ Jack A Gilbert,^6,⁸ Geoff Smith,⁷ and Rob Knight^9,^10,^*

J Gregory Caporaso

¹Department of Computer Science, Northern Arizona University, Flagstaff, AZ, USA

Find articles by J Gregory Caporaso

Christian L Lauber

²Cooperative Institute for Research in Environmental Sciences, UCB 216, University of Colorado, Boulder, CO, USA

Find articles by Christian L Lauber

William A Walters

³Department of Molecular, Cellular and Developmental Biology, UCB 347, University of Colorado, Boulder, CO, USA

Find articles by William A Walters

Donna Berg-Lyons

²Cooperative Institute for Research in Environmental Sciences, UCB 216, University of Colorado, Boulder, CO, USA

Find articles by Donna Berg-Lyons

James Huntley

⁴Colorado Initiative in Molecular Biotechnology, UCB 347, University of Colorado, Boulder, CO, USA

Find articles by James Huntley

Noah Fierer

²Cooperative Institute for Research in Environmental Sciences, UCB 216, University of Colorado, Boulder, CO, USA

⁵Department of Ecology and Evolutionary Biology, UCB 334, University of Colorado, Boulder, Colorado, USA

Find articles by Noah Fierer

Sarah M Owens

⁶Argonne National Laboratory, Argonne, IL, USA

Find articles by Sarah M Owens

Jason Betley

⁷Illumina Cambridge Ltd., Chesterford Research Park, Saffron Walden, Essex, UK

Find articles by Jason Betley

Louise Fraser

⁷Illumina Cambridge Ltd., Chesterford Research Park, Saffron Walden, Essex, UK

Find articles by Louise Fraser

Markus Bauer

⁷Illumina Cambridge Ltd., Chesterford Research Park, Saffron Walden, Essex, UK

Find articles by Markus Bauer

Niall Gormley

⁷Illumina Cambridge Ltd., Chesterford Research Park, Saffron Walden, Essex, UK

Find articles by Niall Gormley

Jack A Gilbert

⁶Argonne National Laboratory, Argonne, IL, USA

⁸Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA

Find articles by Jack A Gilbert

Geoff Smith

⁷Illumina Cambridge Ltd., Chesterford Research Park, Saffron Walden, Essex, UK

Find articles by Geoff Smith

Rob Knight

⁹Department of Chemistry and Biochemistry, UCB 215, University of Colorado, Boulder, CO, USA

¹⁰Howard Hughes Medical Institute, University of Colorado at Boulder, UCB 215, Boulder, CO, USA

Find articles by Rob Knight

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Associated Data

Supplementary Materials: Supplementary Figure 1.
ismej20128x1.pdf (426K)
Supplementary File 1.
ismej20128x2.txt (181K)
Supplementary File 2.
ismej20128x3.xls (38K)
Supplementary Information.
ismej20128x4.doc (27K)
Supplementary Methods.
ismej20128x5.doc (384K)

Go to:

Abstract

DNA sequencing continues to decrease in cost with the Illumina HiSeq2000 generating up to 600Gb of paired-end 100 base reads in a ten-day run. Here we present a protocol for community amplicon sequencing on the HiSeq2000 and MiSeq Illumina platforms, and apply that protocol to sequence 24 microbial communities from host-associated and free-living environments. A critical question as more sequencing platforms become available is whether biological conclusions derived on one platform are consistent with what would be derived on a different platform. We show that the protocol developed for these instruments successfully recaptures known biological results, and additionally that biological conclusions are consistent across sequencing platforms (the HiSeq2000 versus the MiSeq) and across the sequenced regions of amplicons.

Keywords: illumine, barcoded sequencing, QIIME

DNA sequencing cost continues to decline: a vast price per sequence decrease on Illumina HiSeq2000 and MiSeq platforms further supports democratization of sequencing (Tringe and Hugenholtz, 2008). Interest in amplicon sequencing on Illumina is growing (Bartram et al., 2011; Caporaso et al., 2011; Zhou et al., 2011), largely due to lower cost per sequence than other platforms, enabling high-throughput microbial ecology at the greatest coverage yet possible. Although some technical issues exist with community sequencing, such as PCR primer biases and differential DNA extraction efficiency from different organisms in complex communities, these techniques continue to vastly expand our understanding of the microbial world.

Here we present an amplicon sequencing protocol for the HiSeq2000 and MiSeq platforms, and apply this protocol to sequence host-associated and free-living microbial communities to verify that biological conclusions drawn from the data are consistent across platforms and sequence reads. The HiSeq and MiSeq platforms differ markedly in scale. The HiSeq2000 produces >50Gb per day, and in the course of a 10.8 day run produces 1.6 billion 100-base paired-end reads. By contrast, the MiSeq is for single-day experiments, and generates 1.5Gb per day from 5 million 150-base paired-end reads. Our results capture known differences between microbial communities on each platform; biological conclusions drawn are consistent across platforms and sequence reads. This protocol is therefore ready for widespread use in microbial community analysis, such as by the Earth Microbiome Project (Gilbert et al., 2010), which has adopted it for amplicon sequencing. Details on the sequencing protocol are provided as Supplementary Methods.

Twenty-four samples were sequenced on three paired-end Illumina HiSeq2000 lanes, and in one paired-end MiSeq run. The samples represented soil (source: USA; n=8) and several host-associated environment types: human feces (source: USA; n=2), mouth (source: USA; n=2) and skin (source: USA; n=6); canine feces (source: USA; n=1) mouth (source: USA; n=1) and skin (source: USA; n=4). These four paired-end lanes (three on HiSeq and one on MiSeq) resulted in eight sets of reads, corresponding to 5′ and 3′ reads from each lane. These sets of reads were treated as independent replicates to assess the reproducibility of the results.

We were primarily interested in whether known differences between microbial communities could be recaptured on these Illumina platforms to determine their suitability for large-scale surveys of microbial communities. We observed several expected results in principal coordinates plots of weighted UniFrac distances (Figure 1). First, we observed primary separation of samples based on whether they were derived from a free-living environment (soil; cyan) or host-associated environment (all other colors) (Ley et al., 2008). Next we observed separation of fecal samples (yellow; red) from all other host-associated sample types (Costello et al., 2009).

An external file that holds a picture, illustration, etc.
Object name is ismej20128f1.jpg

Figure 1

Procrustes plots comparing: (a) 5′ reads from HiSeq lane 6 to 5′ reads from HiSeq lane 8; (b) 5′ reads from HiSeq lane 6 to 3′ reads from HiSeq lane 8; (c) 5′ reads from HiSeq lane 6 to 5′ MiSeq reads; (d) 5′ MiSeq reads to 3′ MiSeq reads. Lines connect paired samples.

We were additionally interested in reproducibility across lanes and reads within and between each platform. To test this, we ran the 24 samples on three HiSeq paired-end lanes and 1 MiSeq paired-end lane, and analyzed each resulting set of reads independently. As our biological conclusions are frequently driven by the results of principal coordinates analyses based on weighted UniFrac distances, we compared these plots using Procrustes analysis (Gower, 1975; Figure 1; Table 1) as implemented in QIIME and found that the observations were highly reproducible across lanes, read directions and platforms. All 28 possible lane/read pair combinations produced highly significant P-values based on 10000 Monte Carlo iterations (P<0.0001; Bonferroni-adjusted α_0.01=0.0004).

Table 1

M² and Monte Carlo P-values for all Procrustes comparisons

	HiSeq lane 6, 5′	HiSeq lane 6, 3′	HiSeq lane 7, 5′	HiSeq lane 7, 3′	HiSeq lane 8, 5′	HiSeq lane 8, 3′	MiSeq, 5′
Procrustes M²
HiSeq lane 6, 5′
HiSeq lane 6, 3′	0.006
HiSeq lane 7, 5′	0.000	0.006
HiSeq lane 7, 3′	0.005	0.000	0.006
HiSeq lane 8, 5′	0.000	0.006	0.000	0.005
HiSeq lane 8, 3′	0.005	0.000	0.006	0.006	0.006
MiSeq, 5′	0.006	0.009	0.006	0.008	0.007	0.008
MiSeq, 3′	0.007	0.007	0.007	0.007	0.007	0.008	0.002

P-value (based on 10000 Monte Carlo iterations)
HiSeq lane 6, 5′
HiSeq lane 6, 3′	0.0000
HiSeq lane 7, 5′	0.0000	0.0000
HiSeq lane 7, 3′	0.0000	0.0000	0.0000
HiSeq lane 8, 5′	0.0000	0.0000	0.0000	0.0000
HiSeq lane 8, 3′	0.0000	0.0000	0.0000	0.0000	0.0000
MiSeq, 5′	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
MiSeq, 3′	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

Open in a separate window

Taken together, these results suggest that the protocol previously developed for high-throughput community sequencing on the Illumina GAIIx has been successfully adapted for the HiSeq2000 and MiSeq platforms, again greatly decreasing the cost per sequence of amplicon sequencing to ~15000 single-end reads per USD$1 on the HiSeq2000. For example, based on our lowest high-quality sequence per lane count of 22928291 reads (Supplementary File 2, HiSeq 3′ lane 6), if using all 2167 barcodes in each of 15 lanes on the HiSeq2000, leaving one lane for a control, then it is possible to sequence 32505 samples in a week at a depth of 10580 sequences per sample for approximately $22000 in sequencing costs. Longer barcodes could additionally be developed to facilitate more sequences per sample at a lower depth of sequencing. On the basis of the lowest high-quality sequence count on the MiSeq of 1603532 reads (Supplementary File 2, MiSeq 3′), if using all 2167 barcodes, it is possible to sequence 2167 samples in a 12h run at a depth of 740 sequence per sample for approximately $800 in sequencing costs.

A relevant question is whether the decreased cost of sequencing should be applied to obtain deeper coverage of samples, or to increase the number of samples that are sequenced. Figure 1c compares the results of sequencing the same samples on the HiSeq 2000 at a median depth of 1207709 sequences per sample and the MiSeq platform at a depth of 43271 sequences per sample. The highly significant Procrustes result (P<0.0001) implies that we draw the same beta diversity conclusions from either sequencing run, despite a two order of magnitude increase in sequencing depth on the HiSeq2000. Similarly, when sampling to only 10 sequences per sample Procrustes results are still highly significant (P<0.0001; Supplementary Figure 1), although the higher M² value indicates that the correlation is not as strong as when sampling to 100 sequences per sample. These observations, in agreement with studies that have addressed this question directly (Kuczynski et al., 2010), suggest that increasing the sequencing depth is not likely to provide additional insight into questions of beta diversity, and we therefore argue that (for questions of beta diversity in particular) the decreased cost of sequencing should be applied to study microbial systems using many more samples, for example, in dense temporal or spatial analyses, rather than with many more sequences per sample. Of course, if the objective is to identify taxa that are very rare in communities, deeper sequencing will be advantageous. Additionally we note that while as few as 10 sequences per sample may be useful for differentiating very different environment types (for example, soil and feces), as environments become more similar (for example, two soil samples of different pH) more sequences will be required to differentiate them.

As sequencing costs continue to decrease our studies of the microbial world can continue to increase in scope. The protocol presented here opens the HiSeq2000 and MiSeq Illumina platforms to community amplicon sequencing. The data generated by each is similar, but differs in scale and therefore support different applications. For large projects where time is less of an issue but cost per sequence is a major concern, the HiSeq platform allows massively parallel sequencing at the lowest cost. Here we show that comparable data can be generated on the MiSeq for smaller projects where it is important to process samples quickly, for example, in routine environmental or patient monitoring or in preliminary investigations for larger projects. We expect that this is another step toward the era of ubiquitous DNA sequencing, when sequencers become standard equipment in research and clinical laboratories. Finally, we show that technical replicates run on different sequencing platforms and from sequencing of different regions of amplicons should yield the same biological conclusions: critical information as more sequencing platforms become available.

Go to:

Acknowledgments

We wish to thank the National Ecological Observatory Network (a project sponsored by the National Science Foundation and managed under cooperative agreement by NEON, Inc.) for donation of the soil samples; and Aurelie Breton and Joshua Quick for running the MiSeq instrument. This work was funded in part by Amazon Web Services, NIH, Crohn's and Colitis Foundation of America, The Bill and Melinda Gates Foundation, and the Howard Hughes Medical Institute.

Go to:

Notes

Several authors on this manuscript are employees of Illumina, Inc., whose technology is tested in this study.

Go to:

Footnotes

Supplementary Information accompanies the paper on The ISME Journal website (http://www.nature.com/ismej)

Go to:

Supplementary Material

References

Bartram AK, Lynch MD, Stearns JC, Moreno-Hagelsieb G, Neufeld JD. Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads. Appl Environ Microbiol. 2011;77:3846–3852. [Europe PMC free article] [Abstract] [Google Scholar]
Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci USA. 2011;108 (Suppl 1:4516–4522. [Europe PMC free article] [Abstract] [Google Scholar]
Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial community variation in human body habitats across space and time. Science. 2009;326:1694–1697. [Abstract] [Google Scholar]
Gilbert JA, Meyer F, Jansson J, Gordon J, Pace N, Tiedje J, et al. The Earth Microbiome Project: meeting report of the ‘1 EMP meeting on sample selection and acquisition' at Argonne National Laboratory October 6 2010. Stand Genomic Sci. 2010;3:249–253. [Europe PMC free article] [Abstract] [Google Scholar]
Gower JC. Generalized procrustes analysis. Psychometrika. 1975;40:33–51. [Google Scholar]
Kuczynski J, Liu Z, Lozupone C, McDonald D, Fierer N, Knight R. Microbial community resemblance methods differ in their ability to detect biologically relevant patterns. Nat Methods. 2010;7:813–819. [Europe PMC free article] [Abstract] [Google Scholar]
Ley RE, Lozupone CA, Hamady M, Knight R, Gordon JI. Worlds within worlds: evolution of the vertebrate gut microbiota. Nat Rev Microbiol. 2008;6:776–788. [Europe PMC free article] [Abstract] [Google Scholar]
Tringe SG, Hugenholtz P. A renaissance for the pioneering 16S rRNA gene. Curr Opin Microbiol. 2008;11:442–446. [Abstract] [Google Scholar]
Zhou HW, Li DF, Tam NF, Jiang XT, Zhang H, Sheng HF, et al. BIPES, a cost-effective high-throughput method for assessing microbial diversity. ISME J. 2011;5:741–749. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from The ISME Journal are provided here courtesy of Oxford University Press

Full text links

Read article at publisher's site: https://doi.org/10.1038/ismej.2012.8

Read article for free, from open access legal sources, via Unpaywall: https://www.nature.com/articles/ismej20128.pdf

Citations & impact

Impact metrics

3,965

Citations

Jump to Citations

Data citations

Jump to Data

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/641578

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/641578

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1038/ismej.2012.8

Supporting

Mentioning

Contrasting

5580

Article citations

Effects of cotton peanut rotation on crop yield soil nutrients and microbial diversity.
Cui F, Li Q, Shang S, Hou X, Miao H, Chen X
Sci Rep, 14(1):28072, 14 Nov 2024
Cited by: 0 articles | PMID: 39543215 | PMCID: PMC11564633
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
The protozoan commensal Tritrichomonas musculis is a natural adjuvant for mucosal IgA.
Cao EY, Burrows K, Chiaranunt P, Popovic A, Zhou X, Xie C, Thakur A, Britton G, Spindler M, Ngai L, Tai SL, Dasoveanu DC, Nguyen A, Faith JJ, Parkinson J, Gommerman JL, Mortha A
J Exp Med, 221(12):e20221727, 13 Nov 2024
Cited by: 0 articles | PMID: 39535524 | PMCID: PMC11561467
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Seasonal effects of long-term warming on ecosystem function and bacterial diversity.
Shinfuku MS, Domeignoz-Horta LA, Choudoir MJ, Frey SD, Mitchell MF, Ranjan R, DeAngelis KM
PLoS One, 19(10):e0311364, 24 Oct 2024
Cited by: 0 articles | PMID: 39446706 | PMCID: PMC11500971
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Protorhabditis nematodes and pathogen-antagonistic bacteria interactively promote plant health.
Xu X, Jiang R, Wang X, Liu S, Dong M, Mao H, Li X, Ni Z, Lv N, Deng X, Xiong W, Tao C, Li R, Shen Q, Geisen S
Microbiome, 12(1):221, 28 Oct 2024
Cited by: 0 articles | PMID: 39468636 | PMCID: PMC11520073
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Revealing host genome-microbiome networks underlying feed efficiency in dairy cows.
Martinez-Boggio G, Monteiro HF, Lima FS, Figueiredo CC, Bisinotto RS, Santos JEP, Mion B, Schenkel FS, Ribeiro ES, Weigel KA, Rosa GJM, Peñagaricano F
Sci Rep, 14(1):26060, 30 Oct 2024
Cited by: 0 articles | PMID: 39472728 | PMCID: PMC11522680
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (3,965) article citations

Other citations

Wikipedia

https://en.wikipedia.org/wiki/Deinococcus_marmoris

Data

Data behind the article

This data has been text mined from the article, or deposited into data resources.

BioStudies: supplemental material and supporting data

http://www.ebi.ac.uk/biostudies/studies/S-EPMC3400413?xr=true

Data that cites the article

This data has been provided by curated databases and other sources that have cited the article.

Nucleotide sequences in ENA (4)

Sequence Read Archive(SRA)(ENA - ERP020892)
INSDC Project(ENA - PRJEB18925)
INSDC Project(ENA - PRJEB18924)
Sequence Read Archive(SRA)(ENA - ERP020891)

Funding

Funders who supported this work.

Crohn's & Colitis Foundation (1)

Grant ID: 2158
81 publications

Howard Hughes Medical Institute

Intramural NIH HHS

NHGRI NIH HHS (3)

Grant ID: U01 HG006537
6 publications
Grant ID: R01 HG004872
63 publications
Grant ID: U01 HG004866
50 publications

NIDDK NIH HHS (1)

Grant ID: P01 DK078669
103 publications

NIGMS NIH HHS (2)

Grant ID: T32 GM008759
338 publications
Grant ID: T32 GM142607
306 publications

Search life-sciences literature (45,103,589 articles, preprints and more)

Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms.

Author information

Affiliations

Authors

ORCIDs linked to this article

Abstract

Free full text

Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms

J Gregory Caporaso

Christian L Lauber

William A Walters

Donna Berg-Lyons

James Huntley

Noah Fierer

Sarah M Owens

Jason Betley

Louise Fraser

Markus Bauer

Niall Gormley

Jack A Gilbert

Geoff Smith

Rob Knight

Associated Data

Abstract

Table 1

Acknowledgments

Notes

Footnotes

Supplementary Material

Supplementary Figure 1

Supplementary File 1

Supplementary File 2

Supplementary Information

Supplementary Methods

References

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Other citations

Wikipedia

Data

Data behind the article

BioStudies: supplemental material and supporting data

Data that cites the article

Nucleotide sequences in ENA (4)

Similar Articles

Funding

Crohn's & Colitis Foundation (1)﻿

Howard Hughes Medical Institute

Intramural NIH HHS

NHGRI NIH HHS (3)﻿

NIDDK NIH HHS (1)﻿

NIGMS NIH HHS (2)﻿

Partnerships & funding

Crohn's & Colitis Foundation (1)

NHGRI NIH HHS (3)

NIDDK NIH HHS (1)

NIGMS NIH HHS (2)