Abstract
Free full text
GeneMark.hmm: new solutions for gene finding.
Abstract
The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algorithms. The GeneMark. hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The idea was to embed the GeneMark models into naturally derived hidden Markov model framework with gene boundaries modeled as transitions between hidden states. We also used the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. The algorithm was evaluated on several test sets including 10 complete bacterial genomes. It was shown that the new algorithm is significantly more accurate than GeneMark in exact gene prediction. Interestingly, the high gene finding accuracy was observed even in the case when Markov models of order zero, one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.
Full Text
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995 Jul 28;269(5223):496–512. [Abstract] [Google Scholar]
- Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995 Oct 20;270(5235):397–403. [Abstract] [Google Scholar]
- Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. [Abstract] [Google Scholar]
- Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC, Herrmann R. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 1996 Nov 15;24(22):4420–4449. [Europe PMC free article] [Abstract] [Google Scholar]
- Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997 Sep 5;277(5331):1453–1462. [Abstract] [Google Scholar]
- Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA, et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature. 1997 Aug 7;388(6642):539–547. [Abstract] [Google Scholar]
- Smith DR, Doucette-Stamm LA, Deloughery C, Lee H, Dubois J, Aldredge T, Bashirzadeh R, Blakely D, Cook R, Gilbert K, et al. Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J Bacteriol. 1997 Nov;179(22):7135–7155. [Europe PMC free article] [Abstract] [Google Scholar]
- Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, Azevedo V, Bertero MG, Bessières P, Bolotin A, Borchert S, et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature. 1997 Nov 20;390(6657):249–256. [Abstract] [Google Scholar]
- Klenk HP, Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK, Peterson JD, et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature. 1997 Nov 27;390(6658):364–370. [Abstract] [Google Scholar]
- Gelfand MS. Prediction of function in DNA sequence analysis. J Comput Biol. 1995 Spring;2(1):87–115. [Abstract] [Google Scholar]
- Churchill GA. Stochastic models for heterogeneous DNA sequences. Bull Math Biol. 1989;51(1):79–94. [Abstract] [Google Scholar]
- Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [Abstract] [Google Scholar]
- Baldi P, Chauvin Y, Hunkapiller T, McClure MA. Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059–1063. [Europe PMC free article] [Abstract] [Google Scholar]
- Krogh A, Mian IS, Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 1994 Nov 11;22(22):4768–4778. [Europe PMC free article] [Abstract] [Google Scholar]
- Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997 Apr 25;268(1):78–94. [Abstract] [Google Scholar]
- Henderson J, Salzberg S, Fasman KH. Finding genes in DNA with a Hidden Markov Model. J Comput Biol. 1997 Summer;4(2):127–141. [Abstract] [Google Scholar]
- Link AJ, Robison K, Church GM. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis. 1997 Aug;18(8):1259–1313. [Abstract] [Google Scholar]
- Médigue C, Rouxel T, Vigier P, Hénaut A, Danchin A. Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol. 1991 Dec 20;222(4):851–856. [Abstract] [Google Scholar]
- Lawrence JG. Selfish operons and speciation by gene transfer. Trends Microbiol. 1997 Sep;5(9):355–359. [Abstract] [Google Scholar]
- Lukashin AV, Engelbrecht J, Brunak S. Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. Nucleic Acids Res. 1992 May 25;20(10):2511–2516. [Europe PMC free article] [Abstract] [Google Scholar]
- Hayes WS, Borodovsky M. Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction. Pac Symp Biocomput. 1998:279–290. [Abstract] [Google Scholar]
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. [Europe PMC free article] [Abstract] [Google Scholar]
- Borodovsky M, McIninch JD, Koonin EV, Rudd KE, Médigue C, Danchin A. Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 1995 Sep 11;23(17):3554–3562. [Europe PMC free article] [Abstract] [Google Scholar]
- Sacerdot C, Dessen P, Hershey JW, Plumbridge JA, Grunberg-Manago M. Sequence of the initiation factor IF2 gene: unusual protein features and homologies with elongation factors. Proc Natl Acad Sci U S A. 1984 Dec;81(24):7787–7791. [Europe PMC free article] [Abstract] [Google Scholar]
- Missiakas D, Georgopoulos C, Raina S. The Escherichia coli heat shock gene htpY: mutational analysis, cloning, sequencing, and transcriptional regulation. J Bacteriol. 1993 May;175(9):2613–2624. [Europe PMC free article] [Abstract] [Google Scholar]
Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
Full text links
Read article at publisher's site: https://doi.org/10.1093/nar/26.4.1107
Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/nar/article-pdf/26/4/1107/9469816/26-4-1107.pdf
Citations & impact
Impact metrics
Article citations
Fine-scale genomic analysis of the tree endophyte <i>Annulohypoxylon</i> sp. FPYF3050 producing monoterpene 1,8-cineole.
Microbiol Resour Announc, 13(11):e0119923, 25 Sep 2024
Cited by: 0 articles | PMID: 39320091 | PMCID: PMC11556029
Nine Cluster E mycobacteriophages isolated from soil.
Microbiol Resour Announc, 13(10):e0046324, 30 Aug 2024
Cited by: 0 articles | PMID: 39212351 | PMCID: PMC11465865
Chromosome-level genome assembly and annotation of the black sea urchin Arbacia lixula (Linnaeus, 1758).
DNA Res, 31(4):dsae020, 01 Aug 2024
Cited by: 0 articles | PMID: 38908014 | PMCID: PMC11310861
Genomic sequences of Mycobacterium smegmatis A cluster phages LBerry, Pembroke, and Zolita.
Microbiol Resour Announc, 13(8):e0050424, 09 Jul 2024
Cited by: 0 articles | PMID: 38980043 | PMCID: PMC11320968
Mycobacteriophage maravista: a cluster F1 phage discovered on Cape Cod, Massachusetts.
Microbiol Resour Announc, 13(7):e0050224, 11 Jun 2024
Cited by: 0 articles | PMID: 38860805 | PMCID: PMC11256854
Go to all (909) article citations
Other citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.
Brief Bioinform, 5(2):118-130, 01 Jun 2004
Cited by: 21 articles | PMID: 15260893
Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.
Bioinformatics, 15(11):874-886, 01 Nov 1999
Cited by: 22 articles | PMID: 10743554
How to interpret an anonymous bacterial genome: machine learning approach to gene identification.
Genome Res, 8(11):1154-1171, 01 Nov 1998
Cited by: 62 articles | PMID: 9847079
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC
American Society for Microbiology, Washington (DC), 02 Oct 2020
Cited by: 0 articles | PMID: 33001599
ReviewBooks & documents Free full text in Europe PMC