TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage.

Brandt BW; Bonder MJ; Huse SM; Zaura E

doi:10.1093/nar/gks418

TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage.

Affiliations

1. Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands.
Authors
Brandt BW¹
(1 author)

ORCIDs linked to this article

Nucleic Acids Research, 22 May 2012, 40(Web Server issue):W82-7
https://doi.org/10.1093/nar/gks418 PMID: 22618877 PMCID: PMC3394339

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

Amplicon sequencing of the hypervariable regions of the small subunit ribosomal RNA gene is a widely accepted method for identifying the members of complex bacterial communities. Several rRNA gene sequence reference databases can be used to assign taxonomic names to the sequencing reads using BLAST, USEARCH, GAST or the RDP classifier. Next-generation sequencing methods produce ample reads, but they are short, currently ∼100-450 nt (depending on the technology), as compared to the full rRNA gene of ∼1550 nt. It is important, therefore, to select the right rRNA gene region for sequencing. The primers should amplify the species of interest and the hypervariable regions should differentiate their taxonomy. Here, we introduce TaxMan: a web-based tool that trims reference sequences based on user-selected primer pairs and returns an assessment of the primer specificity by taxa. It allows interactive plotting of taxa, both amplified and missed in silico by the primers used. Additionally, using the trimmed sequences improves the speed of sequence matching algorithms. The smaller database greatly improves run times (up to 98%) and memory usage, not only of similarity searching (BLAST), but also of chimera checking (UCHIME) and of clustering the reads (UCLUST). TaxMan is available at http://www.ibi.vu.nl/programs/taxmanwww/.

Free full text

Nucleic Acids Res. 2012 Jul; 40(Web Server issue): W82–W87.

Published online 2012 May 22. https://doi.org/10.1093/nar/gks418

PMCID: PMC3394339

PMID: 22618877

TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage

Bernd W. Brandt,^1,^* Marc J. Bonder,^1,² Susan M. Huse,³ and Egija Zaura¹

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Abstract

Amplicon sequencing of the hypervariable regions of the small subunit ribosomal RNA gene is a widely accepted method for identifying the members of complex bacterial communities. Several rRNA gene sequence reference databases can be used to assign taxonomic names to the sequencing reads using BLAST, USEARCH, GAST or the RDP classifier. Next-generation sequencing methods produce ample reads, but they are short, currently ~100–450nt (depending on the technology), as compared to the full rRNA gene of ~1550nt. It is important, therefore, to select the right rRNA gene region for sequencing. The primers should amplify the species of interest and the hypervariable regions should differentiate their taxonomy. Here, we introduce TaxMan: a web-based tool that trims reference sequences based on user-selected primer pairs and returns an assessment of the primer specificity by taxa. It allows interactive plotting of taxa, both amplified and missed in silico by the primers used. Additionally, using the trimmed sequences improves the speed of sequence matching algorithms. The smaller database greatly improves run times (up to 98%) and memory usage, not only of similarity searching (BLAST), but also of chimera checking (UCHIME) and of clustering the reads (UCLUST). TaxMan is available at http://www.ibi.vu.nl/programs/taxmanwww/.

Go to:

INTRODUCTION

The bacterial small subunit of the ribosomal gene, the 16S rRNA gene, is the most common housekeeping genetic marker used in bacterial phylogeny and taxonomy. The reasons for this are its presence in almost all bacteria, relative stability over time and its size that is large enough for informatics purposes (1). Cloning of the (nearly complete) 16S rRNA gene in Escherichia coli and sequencing, although highly elaborate and costly, became a standard method in determining microbial community composition (2,3). With the advent of high throughput next-generation sequencing (NGS) technology, the cloning bias could be circumvented and the costs per nucleotide substantially reduced. Now, the standard method of assessing the taxonomic composition of microbial communities is to sequence the 16S rRNA gene, using PCR amplification and NGS technology. The bacterial 16S rRNA gene consists of conserved sequences interspersed with variable sequences that include nine hypervariable regions (4). These regions are flanked by conserved parts of the 16S rRNA gene, which are used in primer designs to target as diverse a bacterial community as possible. The sequences of the hypervariable regions themselves are used to discriminate among bacterial taxa.

Different hypervariable regions evolve at different rates and different species of the same genus (or e.g. genera of the same family) may be similar in some hypervariable regions and more divergent in others (5,6). Primer bias occurs when the selected primers do not anneal to the DNA from all members of the community equally, but preferentially amplify certain taxonomic groups. For instance, Verrucomicrobia, a bacterial phylum previously thought to occur in soil at a low abundance, was shown to be highly abundant in different soil samples by simply replacing commonly used primer set 27F/338R (V1–V2), obviously biased against Verrucomicrobia, by the primer set 515F/806R targeting hypervariable region V4 (7). Assessing the nature and extent of primer bias is an important first step whenever primers are selected. In silico testing for the most effective regions for discerning taxa from a particular environment or for finer resolution of particular taxa would have a large impact on experimental costs and outcomes. This has recently been demonstrated within the Human Microbiome Project (8), where both the V1–V3 and the V3–V5 sections of the rRNA gene were sequenced, trimmed and clustered into 3% operational taxonomic units (OTUs) (9). The V1–V3 data showed three dominant Lactobacillus OTUs, which appear to differentiate L. crispatus, L. iners and L. gasseri (10). These OTUs correspond to the three primary vaginal biome types identified by Zhou et al. (11) and Ravel et al. (12). The V3–V5 sequence data, however, was dominated by only one OTU, which included over six different Lactobacillus species. Conversely, the V3–V5 sequence data identified a Bifidobacteriaceae OTU that was not detected as such with the V1–V3 sequences.

The data resulting from PCR amplification and NGS sequencing requires processing through a bioinformatics pipeline. This pipeline should assure that low quality sequences are discarded and meaningful groups or clusters of sequences, OTUs, are created. The representative sequence of each OTU is then compared with sequences found in publicly available 16S rRNA gene databases and, when possible, a consensus taxonomic lineage (genus, family or higher taxon) is given to the OTU. For these downstream analyses of the sequences, only the amplified part of the 16S rRNA gene is required. The use of the short amplicon sequences instead of the full-length rRNA gene as reference sets in computational pipelines, reduces the run times considerably. Some programs such as GAST (13), used to assign taxonomy based on the best match in a Global Alignment for Sequence Taxonomy, require a trimmed database that matches the length of the amplicons. An additional advantage of using a trimmed database is that it can serve as a quality check for accurate trimming of (the sequenced) amplicons.

Programs already exist that test which sequences match a given oligonucleotide probe. For the different rRNA gene databases, these are SILVA’s TestProbe (14), Greengenes’ Probes (15) or RDP’s Probe Match (16). Probes can be designed using stand-alone software, such as Primrose (17) and PrimerProspector (18). The latter provides a probe/primer design pipeline that supports de novo barcoded primer design and includes command-line scripts to analyze taxonomic coverage. Most programs, however, do not return trimmed reference sequences matching the probes.

We have developed TaxMan, a straightforward web-tool, to trim the reference sequences of several rRNA gene databases to the hypervariable regions used, based on pre-selected primers, and to interactively analyze taxonomic coverage. We show that the use of the provided trimmed sequences in computations increases analysis speed. Additionally, by assessing the ability of amplification products to differentiate specific taxa from a particular environment, thus by analyzing the taxonomic coverage using several rRNA gene databases, before performing the sequencing, researchers will be able to better target their experiments to resolve the taxa of greatest interest to their research question. To this end, TaxMan also provides graphical analysis of the taxa that are selected for or against with the selected primer set(s).

Go to:

MATERIALS AND METHODS

Database construction

Several rRNA gene databases are provided, including two oral microbiome-specific databases: CORE (16S rDNA database of the core human oral microbiome) (19) and HOMD (Human Oral Microbiome Database) (20), the vaginal 16S reference package (21), as well as more inclusive databases such as Greengenes (15) and the SILVA comprehensive ribosomal RNA databases (small subunit, small subunit with human skin and mouse wound microbiome, and large subunit) (14). Other databases can be added upon user’s request.

TaxMan uses the sequences in FASTA format and includes the taxonomic lineage as FASTA description. The different taxonomic categories are separated by a semi-colon. For example, ‘Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Porphyromonadaceae;Porphyromonas’.

The taxonomy is taken from the source databases and is not changed. For all databases, missing categories in the taxonomic lineage are represented by an ‘empty string’. This can occur if, for example, no order or family, but a genus was supplied by the respective taxonomy. The ‘empty string’ is replaced with ‘noname’ in the tree. If the database has classified a sequence as unclassified explicitly, this will remain as such.

All databases are made non-redundant. The databases, apart from SILVA, have been preprocessed to include the lineage in the FASTA records.

CORE

The Excel file was downloaded [http://microbiome.osu.edu/ (19)] and the taxonomic categories were concatenated. The CORE accession id and the lineage form the FASTA header line.

HOMD

The 16S rRNA RefSeq and taxon table [http://www.homd.org/ (20)] data were combined based on the HOT identifier. The constructed FASTA headers start with the HOT id merged with the strain synonym followed by the lineage.

Greengenes

The Greengenes PROKMSA_id and GenBank accession in the Greengenes FASTA file [current_GREENGENES_gg16S_unaligned.fasta; http://greengenes.lbl.gov/ (15)] were merged with an underscore and the lineage (Greengenes/Hugenholtz) format was changed.

SILVA

Files were downloaded from http://www.arb-silva.de/ (14).

Vaginal 16S reference

The sequences were taken from the alignment file and gaps were removed (vaginal_aln.fasta; http://microbiome.fhcrc.org/apps/refpkg/). The lineages, based on the taxtable.txt file, contain the following levels: species, genus, family, order, class, phylum and superkingdom. The word ‘unclassified’ was appended to the lineage at the level from which all sub-classifications are absent.

Web server

Input

The web site takes forward and reverse PCR primer sequence(s) as input. Primers may contain ambiguity codes. The reverse primer needs to be in the reverse complement orientation, as is common for PCR primers. The user can further select a target rRNA reference database. Options include setting a mismatch percentage for the primers, removing forward and/or reverse primer(s) from the amplicons and two options related to treatment of (redundant) lineages (cf. online documentation).

Processing

The FASTA sequences of the rRNA gene databases have been preprocessed to contain the taxonomic lineage in the FASTA header. In silico PCR is performed with an adapted version of primersearch from EMBOSS (v6.4.0) (22) to find the positions of the primers in the sequences and with Perl code to extract the corresponding sub-sequences. The adaptation of primersearch changes the expansion of the IUPAC ambiguity codes. For example, R now expands to GAR instead of GA. In cases where more amplicons are produced for a single reference sequence, the longest amplicon is kept. Then, the set of produced amplicon sequences is made non-redundant. Next, a taxonomic tree is built of the amplicons and combined with the tree of the original reference sequences. In cases where different species have identical amplicons, the taxonomic lineages are optionally summarized to the first non-common level, similar to microarray probes, for example, Bacteria;Bacteroidetes;(Sphingobacteria/Flavobacteria). This tree data is used for the HTML Tree viewer, pie-chart plotting (using jqPlot, an open source project by Chris Leonello; http://www.jqplot.com/) and for the FASTA headers in the downloadable file.

Go to:

RESULTS AND DISCUSSION

Overview

For NGS amplicon sequencing of bacterial communities, hypervariable regions of the rRNA genes are amplified with PCR. The TaxMan server provides in silico PCR against several rRNA reference databases and interactive analysis of the resulting taxonomic coverage. If more than one forward and reverse primer is provided, a multiplex PCR is performed: all forward primers are combined with all reverse primers. The ambiguity codes, possibly present in a primer, are expanded to include subsets of ambiguity codes, since the rRNA reference sequences can themselves contain ambiguity codes.

The selection of a (few) hypervariable region(s) of the rRNA gene, resulting in shorter sequences, has two implications:

the reference database can be trimmed to correspond with the used rRNA gene region. This can increase the analysis speed considerably both by reducing the length of the sequences to search against and because shorter sequences can be more redundant, the number of non-redundant sequences to search against is also reduced and
the ability to differentiate taxa is reduced, because the targeted hypervariable region(s) can have identical sequences for different species.

Improvements in speed and memory usage

The difference in run time between using the trimmed versus the original reference data set was assessed for several programs. We measured the run times of BLAST (23) to find the taxonomy of the reads, of UCLUST (24) to cluster the reads (using default and reference optimal) and of UCHIME (25) to chimera check the reads. The test data consisted of reads from pyrosequenced amplicons from oral samples (V5–V7 region, Kraneveld,E.A. et al., submitted for publication). This data was either only denoised (722943 reads) for UCHIME chimera checking or denoised and chimera-checked (644797 reads) for BLAST and UCLUST clustering. For BLAST, the denoised and chimera-checked set was also made non-redundant, leaving 2806 reads.

Table 1 shows the computer run times, memory usage and improvements therein when the different programs were run with the original 16S rRNA gene reference data as compared to the trimmed 16S rRNA gene sequence data (primers removed). As can be seen from Table 1, the use of these trimmed sequences that correspond with the amplicons can result in considerable improvements in both run time and memory usage of 25% up to 98%.

Table 1.

Data on CPU time, run time (hr:mm:ss format), physical memory (mem) and virtual memory (vmem) usage (in kb) as reported by the cluster software (PBS). BLAST was run on eight cores, the other programs on one core. Percentage improvement is calculated as the relative difference (original-trimmed)/original

Program	Measure	Original set	Trimmed set	% Improvement	Fold improvement
BLAST	CPU time	9:17:27	4:05:52	56	2.3
	run time	1:12:48	0:41:26	43	1.8
	mem	396360	189848	52	2.1
	vmem	1297204	974756	25	1.3
UCLUST ref^a	CPU time	0:05:17	0:00:47	85	6.7
	run time	0:05:25	0:00:56	83	5.8
	mem	9456156	699388	93	14
	vmem	12575444	869316	93	14
UCLUST ref opt^b	CPU time	73:46:16	1:14:17	98	60
	run time	73:54:41	1:14:35	98	59
	mem	9374752	1384260	85	6.8
	vmem	12473384	1780908	86	7.0
UCHIME^c	CPU time	29:57:17	3:13:00	89	9.3
	run time	30:00:50	3:13:26	89	9.3
	mem	1009776	164896	84	6.1
	vmem	1118688	267052	76	4.2

Open in a separate window

^aUCLUST reference mode.

^bUCLUST reference optimal mode.

^cThe concordance is 93.5%. The fold improvement is the ratio (original/trimmed)

Server output files, taxonomic coverage and visualization

In addition to producing trimmed versions of a reference database, TaxMan can be used to analyze the taxonomic coverage of the trimmed sequences (amplicons), and the original reference database sequences. We illustrate the use of TaxMan with a primer set used in our previous studies on the oral microbiota of children and oral health (26). The primers target the V5–V6 hypervariable region of the 16S rRNA gene. This example is present on the server.

The output provides an overview of the run: the number of non-redundant sequences in the selected database and number of total and non-redundant sequences that the primers formed. In addition, the percentage of sequences (based on the number of total or non-redundant amplicons) in the entire reference database targeted by the primers is stated. Not all database sequences are full-length rRNA gene sequences. Therefore, especially when primers target the ends of the rRNA gene sequence, the coverage may appear to be lower than expected. Last, links are shown to three different sections of the output page: the download, tree and pie chart sections.

Under ‘Download amplicon and lineage data’, three files can be downloaded: the taxonomic lineage coverage and two FASTA files with the amplicon sequences. The lineage file contains counts for all taxa in the amplicon set and in the reference database. The FASTA files contain the same sequence data, but with different headers for redundant sequences: either the taxonomic lineage is summarized (to the identical part or to the first non-common level) or all original FASTA headers are concatenated.

The tree and especially the pie chart sections provide interactive analysis and visualization. The tree is expandable and searchable (Figure 1). The pie charts provide a different view on the taxonomic coverage to facilitate the analysis of taxonomic distributions (Figure 2). By clicking on a slice of the Root pie, a pie for the next taxonomic level is plotted. For this plot, a percentage threshold can be applied. This threshold filters the data that is plotted at the percentage that the respective taxa occur in their taxonomic parent level. For example, a threshold of 14% for Bacteria will only show those bacterial phyla that occur at least 14% (relative to their counts in the reference database), which would be the phylum Firmicutes in this example.

An external file that holds a picture, illustration, etc.
Object name is gks418f1.jpg

Figure 1.

Partial tree view of the amplicons based on the CORE database. For each node, it shows the number of sequences targeted by the given primers, followed by number in the original reference as well as the percentage. The data used for the tree (except the percentages) is downloadable as the tab-delimited lineage file.

An external file that holds a picture, illustration, etc.
Object name is gks418f2.jpg

Figure 2.

An example of pie plots for the amplicons (CORE database). The distribution of sub-categories within three taxonomic levels, shown as the chart titles, is plotted. The percentage threshold is 0 for all plots. The top panel series is obtained by clicking on Bacteria (Root pie) and Actinobacteria (Bacteria pie). Clicking a pie slice or legend label will produce the next chart and hide the legend of the previous one (except the legend of the Root pie). The bottom panel series of charts is similar, but for the phylum Actinobacteria a plot of differences, indicated by the pink header, is shown. Here, the data refers to the number of sequences missed by the amplicons as compared with the reference data. For the class Actinobacteridae, 46 out of 110 sequences are missing (see legend). The ‘100%’ in the Actinobacteridae pie slice illustrates that all missed sequences in the phylum Actinobacteria belong to the Actinobacteridae class. For Coriobacteridae, no sequences are missing (indicated by 0/9 in the legend). When hovering over a ‘legend’ label, always the number of sequences that are targeted is displayed in the pie (Actinobacteridae; cnt: 64/110). Therefore, this information is the same for both types of pies for Actinobacteria.

For the amplicon sequences, the differences between the taxa targeted by the amplicons compared with the reference can also be plotted. Now, the size of the pie slice relates to the number of sequences missing for this taxonomic level. The percentage threshold here filters on the percentage of missing sequences at the selected taxonomic level. This offers a detailed view on what taxa are absent. For example, with the threshold set to ≥70%, the pie only shows taxa for which at least 70% of the reference sequences are missing. At this threshold, relatively most sequences, not targeted by the primers, are from the phylum Fusobacteria (51 out of 67 in this example). Clearly, these numbers depend on the selected database. However, replacing the V5–V6 primers with V5–V7 primers provided better coverage of the Fusobacteria occurring in the oral cavity.

The pie charts are highly flexible: each pie can be set to plot the differences and each pie can have a different threshold. The selected thresholds are shown in the pie and a pink header indicates the ‘difference plots’.

Go to:

CONCLUSION

The Taxman server provides a user-friendly way to carry out (multiplex) in silico PCR to produce trimmed versions of rRNA gene reference databases. Both the trimmed sequences and the distribution of targeted taxa can be downloaded for local use. TaxMan also supports interactive analysis of the taxonomic coverage including pie charts which can quickly illustrate, with taxonomic trees, which taxa, according to the selected rRNA database, are targeted by the primer set(s) and which are not. The use of the trimmed sequences instead of the full-length rRNA gene sequences in computational pipelines results in significant improvements in the use of computational resources.

Go to:

FUNDING

University of Amsterdam under the research priority area ‘Oral Infections and Inflammation’ (to B.W.B.); National Science Foundation [NSF/BDI 0960626 to S.M.H.]; the European Union Seventh Framework Programme (FP7/2007-2013) under ANTIRESDEV grant agreement no 241446 (to E.Z.). Funding for open access charge: ANTIRESDEV.

Conflict of interest statement. None declared.

Go to:

ACKNOWLEDGEMENTS

We would like to thank the teams who produce the databases used in TaxMan. We are also thankful to the SILVA team for providing the multiple sequence alignment of the SILVA SSU Ref NR data set.

Go to:

REFERENCES

1. Janda JM, Abbott SL. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clin. Microbiol. 2007;45:2761–2764. [Europe PMC free article] [Abstract] [Google Scholar]

2. Röling WFM, Head IM. Prokaryotic systematics: PCR and sequence analysis of amplified 16S rRNA genes. In: Osborn AM, Smith CJ, editors. Molecular Microbial Ecology. New York: Taylor & Francis Group; 2005. pp. 25–56. [Google Scholar]

3. Clarridge JE., III Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clin. Microbiol. Rev. 2004;17:840–862. [Europe PMC free article] [Abstract] [Google Scholar]

4. Petrosino JF, Highlander S, Luna RA, Gibbs RA, Versalovic J. Metagenomic pyrosequencing and microbial identification. Clin. Chem. 2009;55:856–866. [Europe PMC free article] [Abstract] [Google Scholar]

5. Schloss PD. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput. Biol. 2010;6:e1000844. [Europe PMC free article] [Abstract] [Google Scholar]

6. Youssef N, Sheik CS, Krumholz LR, Najar FZ, Roe BA, Elshahed MS. Comparison of species richness estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in 16S rRNA gene-based environmental surveys. Appl. Environ. Microbiol. 2009;75:5227–5236. [Europe PMC free article] [Abstract] [Google Scholar]

7. Bergmann GT, Bates ST, Eilers KG, Lauber CL, Caporaso JG, Walters WA, Knight R, Fierer N. The under-recognized dominance of Verrucomicrobia in soil bacterial communities. Soil. Biol. Biochem. 2011;43:1450–1455. [Europe PMC free article] [Abstract] [Google Scholar]

8. NIH HMP Working Group. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, Bonazzi V, McEwen JE, Wetterstrand KA, Deal C, et al. The NIH Human Microbiome Project. Genome Res. 2009;19:2317–2323. [Europe PMC free article] [Abstract] [Google Scholar]

9. Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 2011;6:e27310. [Europe PMC free article] [Abstract] [Google Scholar]

10. Huse SM, Ye Y, Zhou Y, Fodor AA. A Core human microbiome as viewed through 16S rRNA sequence clusters. PLoS ONE. 2012;7:e34242. [Europe PMC free article] [Abstract] [Google Scholar]

11. Zhou X, Brotman RM, Gajer P, Abdo Z, Schüette U, Ma S, Ravel J, Forney LJ. Recent advances in understanding the microbiology of the female reproductive tract and the causes of premature birth. Infect. Dis. Obstet. Gynecol. 2010;2010:737425. [Europe PMC free article] [Abstract] [Google Scholar]

12. Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SSK, McCulle SL, Karlebach S, Gorle R, Russell J, Tacket CO, et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA. 2011;108(Suppl. 1):4680–4687. [Europe PMC free article] [Abstract] [Google Scholar]

13. Huse SM, Dethlefsen L, Huber JA, Mark Welch D, Relman DA, Sogin ML. Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet. 2008;4:e1000255. [Europe PMC free article] [Abstract] [Google Scholar]

14. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007;35:7188–7196. [Europe PMC free article] [Abstract] [Google Scholar]

15. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006;72:5069–5072. [Europe PMC free article] [Abstract] [Google Scholar]

16. Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005;33:D294–D296. [Europe PMC free article] [Abstract] [Google Scholar]

17. Ashelford KE, Weightman AJ, Fry JC. PRIMROSE: a computer program for generating and estimating the phylogenetic range of 16S rRNA oligonucleotide probes and primers in conjunction with the RDP-II database. Nucleic Acids Res. 2002;30:3481–3489. [Europe PMC free article] [Abstract] [Google Scholar]

18. Walters WA, Caporaso JG, Lauber CL, Berg-Lyons D, Fierer N, Knight R. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics. 2011;27:1159–1161. [Europe PMC free article] [Abstract] [Google Scholar]

19. Griffen AL, Beall CJ, Firestone ND, Gross EL, DiFranco JM, Hardman JH, Vriesendorp B, Faust RA, Janies DA, Leys EJ. CORE: a phylogenetically-curated 16S rDNA database of the core oral microbiome. PLoS ONE. 2011;6:e19051. [Europe PMC free article] [Abstract] [Google Scholar]

20. Chen T, Yu WH, Izard J, Baranova OV, Lakshmanan A, Dewhirst FE. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database. 2010;2010:baq013. [Europe PMC free article] [Abstract] [Google Scholar]

21. Srinivasan S, Hoffman NG, Morgan MT, Matsen FA, Fiedler TL, Ross FJ, McCoy CO, Hall RW, Bumgarner R, Marrazzo JM, et al. Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria. PLoS ONE. 2012;7:e37818. [Europe PMC free article] [Abstract] [Google Scholar]

22. Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. [Abstract] [Google Scholar]

23. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [Europe PMC free article] [Abstract] [Google Scholar]

24. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. [Abstract] [Google Scholar]

25. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. 2011;27:2194–2200. [Europe PMC free article] [Abstract] [Google Scholar]

26. Crielaard W, Zaura E, Schuller AA, Huse SM, Montijn RC, Keijser BJF. Exploring the oral microbiota of children at various developmental stages of their dentition in the relation to their oral health. BMC Med. Genomics. 2011;4:22. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Full text links

Read article at publisher's site: https://doi.org/10.1093/nar/gks418

Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc3394339?pdf=render

Citations & impact

Impact metrics

Citations

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/757109

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/757109

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1093/nar/gks418

Supporting

Mentioning

Contrasting

Article citations

Diet and the Gut Microbiome as Determinants Modulating Metabolic Outcomes in Young Obese Adults.
Livantsova EN, Leonov GE, Starodubova AV, Varaeva YR, Vatlin AA, Koshechkin SI, Korotkova TN, Nikityuk DB
Biomedicines, 12(7):1601, 18 Jul 2024
Cited by: 1 article | PMID: 39062174 | PMCID: PMC11275099
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
VESPA: an optimized protocol for accurate metabarcoding-based characterization of vertebrate eukaryotic endosymbiont and parasite assemblages.
Owens LA, Friant S, Martorelli Di Genova B, Knoll LJ, Contreras M, Noya-Alarcon O, Dominguez-Bello MG, Goldberg TL
Nat Commun, 15(1):402, 09 Jan 2024
Cited by: 1 article | PMID: 38195557 | PMCID: PMC10776621
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
The hallmarks of dietary intervention-resilient gut microbiome.
Klimenko NS, Odintsova VE, Revel-Muroz A, Tyakht AV
NPJ Biofilms Microbiomes, 8(1):77, 08 Oct 2022
Cited by: 12 articles | PMID: 36209276 | PMCID: PMC9547895
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
The Evaluation of the Effects of Two Probiotic Strains on the Oral Ecosystem: A Randomized Clinical Trial.
Volgenant CMC, van der Waal SV, Brandt BW, Buijs MJ, van der Veen MH, Rosema NAM, Fiebich BL, Rose T, Schmitter T, Gajfulin M, Crielaard W, Zaura E
Front Oral Health, 3:825017, 30 Mar 2022
Cited by: 3 articles | PMID: 35434705 | PMCID: PMC9007728
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Gut Microbiota as Early Predictor of Infectious Complications before Cardiac Surgery: A Prospective Pilot Study.
Chernevskaya E, Zuev E, Odintsova V, Meglei A, Beloborodova N
J Pers Med, 11(11):1113, 29 Oct 2021
Cited by: 5 articles | PMID: 34834465 | PMCID: PMC8622065
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (26) article citations

Search life-sciences literature (45,103,477 articles, preprints and more)

TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage.

Author information

Affiliations

Authors

ORCIDs linked to this article

Abstract

Free full text

TaxMan: a server to trim rRNA reference databases and inspect taxonomic coverage

Bernd W. Brandt

Marc J. Bonder

Susan M. Huse

Egija Zaura

Abstract

INTRODUCTION

MATERIALS AND METHODS

Database construction

CORE

HOMD

Greengenes

SILVA

Vaginal 16S reference

Web server

Input

Processing

RESULTS AND DISCUSSION

Overview

Improvements in speed and memory usage

Table 1.

Server output files, taxonomic coverage and visualization

CONCLUSION

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Similar Articles

Partnerships & funding