Abstract
Free full text
Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity
Associated Data
Abstract
Deep sequencing of the gut microbiomes of 1,135 participants from a Dutch population-based cohort shows relations between the microbiome and 126 exogenous and intrinsic host factors, including 31 intrinsic factors, 12 diseases, 19 drug groups, 4 smoking categories, and 60 dietary factors. These factors collectively explain 18.7% of the variation seen in the inter-individual distance of microbial composition. We could associate 110 factors to 125 species and observed that fecal Chromogranin A (CgA), a protein secreted by enteroendocrine cells, was exclusively associated with 61 microbial species whose abundance collectively accounted for 53% of microbial composition. Low CgA levels were seen in individuals with a more diverse microbiome. These results are an important step towards better understanding of environment-diet-microbe-host interactions.
One Sentence Summary
126 factors collectively explain 18.7% of the variation in human gut microbiome composition, with strongest effect observed for the fecal level of Chromogranin A.
The human gut microbiome plays a major role in the production of vitamins, enzymes, and other compounds that digest and metabolize food and regulate our immune system (1). It can be considered as an extra organ, with remarkable dynamics and a major impact on our physiology. The composition of the gut microbiome can be considered as a complex trait, with the quantitative variation in the microbiome affected by a large number of host and environmental factors, each of which may have only a small additive effect, making it difficult to identify the association for each separate item. In this study, we present a systematic metagenomic association analysis on 207 intrinsic and exogenous factors from the LifeLines-DEEP cohort, a Dutch population-based study (2, 3). Our study reveals covariates in the microbiome and, more importantly, provides a list of factors that correlate with shifts in the microbiome composition and functionality.
This study includes stool samples from 1,179 LifeLines-DEEP participants from the general population of the northern part of the Netherlands (2). The cohort comprised predominantly Dutch participants; 93.7% had both parents born in the Netherlands. The gut microbiome was analyzed using paired-end metagenomic shotgun sequencing (MGS) on a HiSeq2000, generating an average of 3.0 Gb of data (about 32.3 million reads) per sample (4). After excluding 44 samples with low read counts, 1,135 participants (474 males and 661 females) remained for further analysis. We tested 207 factors with respect to the microbiomes of these participants: 41 intrinsic factors of various physiological and biomedical measures, 39 self-reported diseases, 44 categories of drugs, 5 categories of smoking status and 78 dietary factors (fig. S1 and table S1). These factors cover dietary habits, life-style, medication use, and health parameters. Most of the factors showed a low or modest inter-correlation (table S2A–C, 2A–D); many are highly variable, including, as expected in the Dutch population, the high consumption of milk products and low use of antibiotics. Antibiotic use in the Netherlands is the lowest in the Europe, at a level half that of the UK and one-third that of Belgium. To cover health-domain factors relevant to the host immune system and gut health, we collected cell counts for eight different blood cell types, measured blood cytokine levels, assessed stool frequency and stool type by Bristol Stool Score, and measured fecal levels of several secreted proteins including calprotectin as a marker for the immune system activation, human-β-defensin-2 (HBD-2) as a marker for defense against invading microbes and chromogranin A (CgA) as a marker for neuroendocrine system activation.
After quality control and removal of sequence reads mapping to the human genome, the microbiome sequence reads were mapped to approximately 1 million microbial-taxonomy-specific marker genes using MetaPhlAn 2.0 (5) to predict the abundance of microorganisms (fig. S3A). For each participant, we predicted the abundance levels for 1,649 microbial taxonomic clades ranging from four different domains to 632 species (Fig 1A). The majority of the reads (97.6%) came from Bacteria, 2.2% from Archaea, 0.2% from Viruses and <0.01% from Eukaryotes. Comparison to previous taxonomic profiles of the same subjects by 16S rRNA gene sequencing (Fig. 1B) showed MGS predicted more microbial species but fewer families and genera. At the phylum level, the abundance of dominant bacterial phyla Firmicutes (63.7%) and Bacteroidetes (8.1%) were similar to estimates based on 16S rRNA gene sequencing, but the abundance of Actinobacteria was higher in MGS (22.3%) than 16S (12.3%) (fig. S4). The microbiome quality control project has recently suggested that microbial composition estimates may not be comparable between studies if sample preparation and data analysis are not done in the same way (6). For instance, compared to the composition reported in other studies of a similar size that used different methods (7, 8), our study detected a higher abundance of Actinobacteria but a lower abundance of Bacteroidetes. Importantly, all samples in our study were isolated and processed using the same pipeline, ensuring low technical variation and high analysis power to access the association of multiple factors with the microbiome.
The high inter-individual variation reflects the community composition (fig. S5) and is clearly driven by the abundance of the dominant phyla (fig. 2A). Our further analysis of microbial composition was confined to the 632 unique species (table S3). For the functional profiling, the abundance of 568,874 UniRef gene families were grouped into clusters of orthologous groups (COG) based on the EggNOG database and MetaCyc pathways (fig. S3A). Although the distribution of diversity, genes, and COG richness showed high inter-variability (Fig. 2B–D), functional profiles based on 23 non-redundant, gene ontology molecular function categories remained stable (fig. S6) within our cohort, similar to previous reports (9).
We correlated 207 factors to the inter-individual variation in microbial composition, diversity, richness of genes, and COGs (fig. S3B). At false discovery rate (FDR) <0.1 level, 126 factors were associated with inter-individual distance of microbial composition (Bray-Curtis distance) (Fig 3, table S4), of which 90% could be replicated in 16S rRNA data from the same subjects (table S5, fig. S7), together explaining 18.7% of the variation in composition distance (fig. S8A). A total of 35 factors were associated with Shannon’s diversity index of microbial composition (together explaining 13.7% variation, table S6, fig. S8A), of which 80% were replicated in 16S rRNA data from the same subjects (table S7); 31 factors were associated with gene richness (together explaining 16.7% variation, table S8) and 34 factors with COG richness (explaining 18.8% variation) (table S9, fig. S8A, for replication rates see table S10). We saw a large overlap between different diversity and richness analyses, and most of them were also associated with composition distance (fig. S8B).
We performed multivariate association analyses between each factor with 170 abundant species (>0.01% of total microbial composition and present in at least 10 individuals) and 215 MetaCyc pathways (fig. S3C). When corrected for age, gender, and sequence depth, we found 485 associations at FDR<0.1 between 110 factors and 125 species (table S11) and 524 associations between 71 factors and 176 MetaCyc pathways (table S12). By correcting the correlation structures among all 207 factors, the number of associations was reduced to 128 independent associations with species (table S13) and 215 associations with pathways (table S14).
Our data confirmed some previous findings and also yielded novel associations. In our study, age and gender were correlated not only with microbial composition distance and diversity but also with functional richness. Women showed higher COG richness than men (adjusted P=0.03), and COG richness increased with age (adjusted P=0.002) (fig. S9). Multiple intrinsic parameters, such as blood cell counts and lipid levels, were associated to composition and function levels as well. For example, a higher level of hemoglobin was consistently associated with lower diversity and functional richness (Fig 3, table S6–S9). The strongest associations we found were for the fecal levels of several secreted proteins, including human-β-defensin-2 (HBD-2), calprotectin (10, 11), and chromogranin A (CgA), with microbial composition, diversity, and functional richness (Fig. 3, Fig. 4A), as well as with specific species (table S11) and pathways (table S12). Among these associations, CgA showed the strongest association with composition distance (adonis R2=0.03, adjusted P=0.0006), microbial diversity (Spearman r=−0.22, adjusted P=1.49×10−12), gene richness (Spearman r=−0.23, adjusted P=9.4×10−13), and COG richness (Spearman r=−0.285, adjusted P=2.53×10−20) (tables S4–S9). The association of CgA with composition distance was then validated in an independent cohort of 19 individuals for whom 16S rRNA gene sequencing data was available (P=0.0065) (fig. S10). A lower CgA level was associated with higher diversity, with functional richness, with high levels of high-density lipoprotein (HDL), and with intake of fruits and vegetables. In contrast, elevated fecal CgA was associated with high fecal levels of calprotectin, high blood levels of triglycerides, high stool frequency, soft stool type, and self-reported irritable bowel syndrome (IBS) (Fig. 4B). After correcting for the confounding effect of all other factors, our analysis revealed 61 species exclusively associated with CgA (Fig. 4C–D, table S13) whose abundance levels collectively accounted for 53% of the total abundance of the microbiome on average, and with 40 MetaCyc pathways (table S14) that accounted for 34.6% of the pathway profiles. The strongest association to CgA was observed for the Archaea species Methanobrevibacter smithii (fig. S11A), which plays an important role in the digestion of polysaccharides by consuming the end products of bacterial fermentation and methanogenesis (12)(fig. S11B). A negative association with CgA abundance was observed for 24 out of 36 species from phylum Bacteroidetes (Fig. 4C–D).
CgA is a member of the granine peptides, which are secreted in nervous, endocrine and immune cells under stress (13), and during active periods of gut-related diseases such as IBS and inflammatory bowel disease, although some findings are contradictory (14–16). Many different functions have been proposed for CgA and other granine peptides, including roles in neurological pathways, pain regulation, and antimicrobial activity against bacteria, fungi, and yeasts (17, 18). However, their mechanism of action and physiological importance need further detailed investigation. To test whether genetic variants that influence CHGA gene expression (encoding CgA) can affect fecal CgA level and the gut microbiome, we tested the effect of six SNPs known to regulate gene expression of CHGA on fecal CgA and abundances of species (table S15). No significant association was observed, suggesting that genetic variation in CHGA expression does not explain the variation observed in the fecal CgA levels and microbiome composition (table S16–S17). Our observation that CgA strongly correlates with microbiome composition, especially with a large number of species from Bacteroidetes phylum, and with diversity will hopefully encourage studies to unravel the role of CgA in gut health.
We also observed associations (FDR<0.1) between 63 dietary factors and inter-individual distances in microbiota composition, including energy (kcal), intake of carbohydrates, proteins and fats, and of specific food items such as bread and soft drinks (Fig. 3, table S4). Drinking buttermilk (sour milk with a low fat content) was associated with high diversity, while drinking high-fat (whole) milk (3.5% fat content) was associated with lower diversity (table S6). Two of the species most strongly associated with drinking buttermilk are Leuconostoc mesenteroides (q=9.1×10−46) and Lactococcus lactis (q=2.5×10−8), both used as a starter culture for industrial fermentation (table S11). The abundance of dairy-fermentation-related bacteria increased with increasing dairy consumption, indicating potential for the use of probiotic drinks to augment and alter the gut microbiome composition. Consumption of alcohol-containing products, coffee, tea, and sugar-sweetened drinks were also correlated with microbial composition. Consumption of sugar-sweetened soda had a negative effect on microbial diversity (adjusted P=5×10−4), whereas consumption of coffee, tea and red wine, which all have a high polyphenol content, was associated with increased diversity (19–21). Red wine consumption correlated with F. prausnitzii abundance, which has anti-inflammatory properties, correlates negatively with inflammatory bowel disease (22), and shows higher abundance in high-richness microbiota (23). Apart from the negative associations between sugar-sweetened soda and bacterial diversity, other features of a Western-style diet, such as higher intake of total energy, snacking, and high-fat (whole) milk, were also associated with lower microbiota diversity (Fig. 3). A higher amount of carbohydrates in the diet was associated with lower microbiome diversity. Total carbohydrate intake was positively associated with Bifidobacteria, but negatively with Lactobacillus, Streptococcus, and Roseburia species. A low carbohydrate diet consistently showed opposite directions for these species. We did not observe an association of carbohydrate intake to prevotella species, as has been described previously (24).
As expected, the use of antibiotics was significantly associated with microbiome composition, in particular with strong and significant decreases in two species from the genus Bifidobacterium (Actinobacteria phylum) (table S11), in line with previous studies (25). Several other drug categories, such as proton pump inhibitors (PPI) (95 users), metformin (15 users), statins (56 users), and laxatives (21 users) also had a strong effect on the gut microbiome. PPI users were found to have profound changes in 33 bacterial pathways (table S12). The most significant positive correlation of PPIs was observed with the pathway of 2,3-butanediol biosynthesis (q=5.3×10−14). We also observed overlap between species and pathways associated to PPI and with calprotectin levels, particularly for bacteria typical of the oral microbiome (table S2A–C, table S11, fig. S12). This is in line with the correlations of PPI with calprotectin levels reported in the literature (26). Even after excluding the 95 PPI users from our analysis, the positive correlation of calprotectin to most oral bacteria remained significant, indicating this association is not due to the confounding effect of PPI (fig. S12). Furthermore, the levels of calprotectin were positively correlated with age and metabolic phenotypes (body mass index (BMI), diabetes, use of statins and metformin, HBAc1, and systolic blood pressure), but negatively correlated with the consumption of vegetables, plant proteins, chocolate, and breads. Multivariate analysis correcting for all factors revealed 14 species (table S13) and 114 bacterial metabolic pathways (table S14) exclusively associated with calprotectin, suggesting calprotectin is robustly associated with gut microbiome.
Metformin is commonly used to control blood sugar levels for treating type 2 diabetes, but can cause gastrointestinal intolerance (27). In 15 metformin users, we observed an increased abundance of Escherichia coli (E. coli) and a positive correlation with specific pathways, including the degradation and utilization of D-glucarate and D-galactarate and pyruvate fermentation pathways. Previous studies in C. elegans indicated the specific drug-bacteria interaction of metformin and E. coli (28). Our results are in line with recent observations in humans (29) that suggest that metformin can impact the microbiome through short-chain fatty acid (SCFA) production. To confirm this observation, we profiled acetate, propionate and butyrate in 24 type 2 diabetes patients in our cohort: 9 non-metformin users and 15 users (4), and found that SCFA levels were consistently higher in metformin-users, especially for propionate (Wilcoxon test P=0.035) (fig. S13).
We assessed the effect of current smoking status, smoking history, parental smoking, and maternal smoking during pregnancy on the gut microbiome. These parameters were associated with Bray-Curtis distance, albeit with very modest effect. We did not detect significant associations for individual species or at pathways. In this study we included 39 self-reported diseases, for which participants had reported at least five cases. IBS was reported by 9.9% of participants (n=112, table S1) and was associated with changes in the gut microbiome and a lower microbial diversity (adjusted P=0.05) (table S6). Species from the Eggerthella and Coprobacillus genera were positively associated with medication and food allergies, respectively. Individuals who had suffered a heart attack (n=10) in the past had a significantly lower abundance of Eubacterium eligens bacterium, even after correcting for all other factors (q=4.6×10−4).
Linking the deep-sequenced MGS data to various intrinsic and exogenous factors from the same individual not only allowed us to detect associations at species level, but also provided new insights into the interaction between the host, microbiota, and environmental factors, including diet. For instance, we have replicated and expanded our association of BMI and blood lipid levels with the gut microbiota based on 16S rRNA gene sequencing data (30) by showing associations with four specific species of the family Rikenellaceae. We previously associated this family with BMI and triglycerides in 16S rRNA data. In the current study we observed higher BMI was associated with lower level of two species from the family Rikenellaceae, Alistipes finegoldii, and Alistipes senegalensis, while blood lipids were associated with other two species, Alistipes shahii and Alistipes putredinis (table S11). Strikingly, these species were also associated to certain dietary factors and drugs. For instance, a high level of Alistipes shahii, which was associated to low TG levels, was linked to higher fruit intake (q= 0.00027). Individuals with a higher abundance level of Alistipes shahii had a higher number of different species in the gut (species richness) (Spearman r=0.2, adjusted p=3.96×10−11), suggesting a beneficial effect on the microbial ecosystem (table S18). Correlations with the number of different species were also found for other bacteria including Roseburia hominis, Coprococcus catus, and Barnesiella intestinihominis and unclassified species from genus Anaerotruncus that also showed correlation both with fruit, vegetable, and nut consumption and with intrinsic phenotypes like HDL, triglycerides and quality of life. Based on this data, it would be interesting to explore the potential to modulate disease-associated species through medication or diet, although we still need to address the causality and underlying mechanism.
Conclusions
Our study revealed significant associations between the gut microbiome and various intrinsic, environmental, dietary and medication parameters, and disease phenotypes, with a high replication rate between MGS and 16S rRNA gene sequencing data from the same subjects. Moreover, our study provides many new intrinsic and exogenous factors that correlate with shifts in the microbiome composition and functionality that can be potentially be manipulated to improve microbiome-related health and we hope our results will inspire further experiments to explore the biological relevance of associated factors. While most of the factors we assessed exerted a very modest effect, fecal levels of Chromogranin A showed a high potential as a biomarker for gut health.
Supplementary Material
Supplemental figures S1-S13 and supplemental methods
Tables S1-S19
Acknowledgments
We thank the LifeLines-DEEP participants and the Groningen LifeLines staff for their collaboration. We thank Jackie Dekens, Mathieu Platteel, and Astrid Maatman for management and technical support. We thank Jackie Senior and Kate Mc Intyre for editing the manuscript.
This project was funded by grants from the Top Institute Food and Nutrition, Wageningen, to C.W. (TiFN GH001), the Netherlands Organization for Scientific Research to J.F. (NWO-VIDI 864.13.013), L.F. (ZonMW-VIDI 917.14.374), and R.W. (ZonMW-VIDI 016.136.308), CardioVasculair Onderzoek Nederland to M.H. and A.Z. (CVON 2012-03). A.Z. holds a Rosalind Franklin Fellowship (University of Groningen) and M.C.C. holds a postdoctoral fellowship from the Fundación Alfonso Martín Escudero. This research received funding from the European Research Council under the European Union’s Seventh Framework Program: C.W. is supported by FP7/2007-2013)/ERC advanced Grant Agreement no. 2012-322698. M.G.N. is supported by an ERC Consolidator Grant (#310372). L.F. is supported by FP7/2007-2013, grant agreement 259867, and by an ERC Starting Grant, grant agreement 637640 (ImmRisk). J.R. and G.F. are supported by FP7 METACARDIS HEALTH-F4-2012-305312, VIB, FWO, IWT, the Rega institute for Medical Research, and KU Leuven. S.V.S. and M.J. are supported by postdoctoral fellowships from FWO.
Footnotes
A.Z., C.W. and J.F. designed the study. A.Z., E.F.T., L.F., and C.W. initiated the cohort and collected cohort data. A.Z., E.F.T., Z.M., S.A.J., M.C.C., and D.K. generated data. A.Z., A.K., M.J.B., E.F.T., M.S., T.V., A.V.V., G.F., S.V.S, J.W., F.I., P.D., M.A.S., C.H., R.J.X., and J.F. analyzed data. G.F, S.V.S., J.W. E.B., M.J., R.K.W., E.J.M.F., M.G.N., D.G., D.J., L.F., Y.S.A., C.H., J.R., R.J.X., and M.H.H. participated in integral discussions. A.Z., A.K., M.J.B., R.J.X., C.W., and J.F. wrote the manuscript.
The authors have no conflicts of interest to report.
Data: The data is currently being uploaded to the European Genotyping Agency (https://www.ebi.ac.uk/ega/) (ega-box-423).
Informed consent: The study was approved by the institutional review board of UMCG, ref.M12.113965.
Supplementary Materials:
Materials and Methods
Figures S1–S13
Tables S1–S19
References (31– 54)
References and Notes
Full text links
Read article at publisher's site: https://doi.org/10.1126/science.aad3369
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc5240844?pdf=render
Citations & impact
Impact metrics
Article citations
Longitudinal analyses of infants' microbiome and metabolome reveal microbes and metabolites with seemingly coordinated dynamics.
Commun Biol, 7(1):1506, 14 Nov 2024
Cited by: 0 articles | PMID: 39543263 | PMCID: PMC11564710
Age-related patterns of microbial dysbiosis in multiplex inflammatory bowel disease families.
Gut, 73(12):1953-1964, 11 Nov 2024
Cited by: 0 articles | PMID: 39122361
Examining the healthy human microbiome concept.
Nat Rev Microbiol, 23 Oct 2024
Cited by: 0 articles | PMID: 39443812
Review
Host-microbe interaction-mediated resistance to DSS-induced inflammatory enteritis in sheep.
Microbiome, 12(1):208, 21 Oct 2024
Cited by: 0 articles | PMID: 39434180 | PMCID: PMC11492479
Evaluation of inter- and intra-variability in gut health markers in healthy adults using an optimised faecal sampling and processing method.
Sci Rep, 14(1):24580, 19 Oct 2024
Cited by: 0 articles | PMID: 39427011 | PMCID: PMC11490648
Go to all (902) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Covariation of the Fecal Microbiome with Diet in Nonpasserine Birds.
mSphere, 6(3):e00308-21, 12 May 2021
Cited by: 16 articles | PMID: 33980682 | PMCID: PMC8125056
Impact of urbanization on gut microbiome mosaics across geographic and dietary contexts.
mSystems, 9(10):e0058524, 17 Sep 2024
Cited by: 0 articles | PMID: 39287374 | PMCID: PMC11494887
Microbiome Analysis Reveals Diversity and Function of Mollicutes Associated with the Eastern Oyster, Crassostrea virginica.
mSphere, 6(3):e00227-21, 12 May 2021
Cited by: 13 articles | PMID: 33980678 | PMCID: PMC8125052
Enterotypes in the landscape of gut microbial community composition.
Nat Microbiol, 3(1):8-16, 18 Dec 2017
Cited by: 470 articles | PMID: 29255284 | PMCID: PMC5832044
Review Free full text in Europe PMC
Funding
Funders who supported this work.
CardioVasculair Onderzoek Nederland (1)
Grant ID: CVON 2012-03
Dutch Research Council (NWO) (4)
Grant ID: 864.13.013
Grant ID: NWO-VIDI 864.13.013
Grant ID: ZonMW-VIDI 917.14.374
Grant ID: ZonMW-VIDI 016.136.308
European Research Council (4)
Celiac disease: from lincRNAs to disease mechanism (CD-LINK)
Prof Tjitske Nienke Wijmenga, University Medical Center Groningen
Grant ID: 322698
The interaction landscape between microbial colonization and functional genome of the host: a systems biology approach in fungal infections (SysBioFun)
Prof Mihai Netea, Radboud University Nijmegen
Grant ID: 310372
Grant ID: 2012-322698
Defining how environmental factors influence downstream effects of immune-mediated disease risk-SNPs (ImmRisk)
Dr Lude Franke, University Medical Center Groningen
Grant ID: 637640
European Union's Seventh Framework Program (1)
Grant ID: FP7/2007-2013
NIDDK NIH HHS (1)
Grant ID: P30 DK043351
Top Institute Food and Nutrition, Wageningen (1)
Grant ID: TiFN GH001