Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algorithms. The GeneMark. hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The idea was to embed the GeneMark models into naturally derived hidden Markov model framework with gene boundaries modeled as transitions between hidden states. We also used the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. The algorithm was evaluated on several test sets including 10 complete bacterial genomes. It was shown that the new algorithm is significantly more accurate than GeneMark in exact gene prediction. Interestingly, the high gene finding accuracy was observed even in the case when Markov models of order zero, one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.

Free full text 


Logo of narLink to Publisher's site
Nucleic Acids Res. 1998 Feb 15; 26(4): 1107–1115.
PMCID: PMC147337
PMID: 9461475

GeneMark.hmm: new solutions for gene finding.

Abstract

The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algorithms. The GeneMark. hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The idea was to embed the GeneMark models into naturally derived hidden Markov model framework with gene boundaries modeled as transitions between hidden states. We also used the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. The algorithm was evaluated on several test sets including 10 complete bacterial genomes. It was shown that the new algorithm is significantly more accurate than GeneMark in exact gene prediction. Interestingly, the high gene finding accuracy was observed even in the case when Markov models of order zero, one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.

Full Text

The Full Text of this article is available as a PDF (130K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995 Jul 28;269(5223):496–512. [Abstract] [Google Scholar]
  • Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995 Oct 20;270(5235):397–403. [Abstract] [Google Scholar]
  • Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. [Abstract] [Google Scholar]
  • Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC, Herrmann R. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 1996 Nov 15;24(22):4420–4449. [Europe PMC free article] [Abstract] [Google Scholar]
  • Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997 Sep 5;277(5331):1453–1462. [Abstract] [Google Scholar]
  • Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA, et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature. 1997 Aug 7;388(6642):539–547. [Abstract] [Google Scholar]
  • Smith DR, Doucette-Stamm LA, Deloughery C, Lee H, Dubois J, Aldredge T, Bashirzadeh R, Blakely D, Cook R, Gilbert K, et al. Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J Bacteriol. 1997 Nov;179(22):7135–7155. [Europe PMC free article] [Abstract] [Google Scholar]
  • Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, Azevedo V, Bertero MG, Bessières P, Bolotin A, Borchert S, et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature. 1997 Nov 20;390(6657):249–256. [Abstract] [Google Scholar]
  • Klenk HP, Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK, Peterson JD, et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature. 1997 Nov 27;390(6658):364–370. [Abstract] [Google Scholar]
  • Gelfand MS. Prediction of function in DNA sequence analysis. J Comput Biol. 1995 Spring;2(1):87–115. [Abstract] [Google Scholar]
  • Churchill GA. Stochastic models for heterogeneous DNA sequences. Bull Math Biol. 1989;51(1):79–94. [Abstract] [Google Scholar]
  • Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [Abstract] [Google Scholar]
  • Baldi P, Chauvin Y, Hunkapiller T, McClure MA. Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059–1063. [Europe PMC free article] [Abstract] [Google Scholar]
  • Krogh A, Mian IS, Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 1994 Nov 11;22(22):4768–4778. [Europe PMC free article] [Abstract] [Google Scholar]
  • Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997 Apr 25;268(1):78–94. [Abstract] [Google Scholar]
  • Henderson J, Salzberg S, Fasman KH. Finding genes in DNA with a Hidden Markov Model. J Comput Biol. 1997 Summer;4(2):127–141. [Abstract] [Google Scholar]
  • Link AJ, Robison K, Church GM. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis. 1997 Aug;18(8):1259–1313. [Abstract] [Google Scholar]
  • Médigue C, Rouxel T, Vigier P, Hénaut A, Danchin A. Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol. 1991 Dec 20;222(4):851–856. [Abstract] [Google Scholar]
  • Lawrence JG. Selfish operons and speciation by gene transfer. Trends Microbiol. 1997 Sep;5(9):355–359. [Abstract] [Google Scholar]
  • Lukashin AV, Engelbrecht J, Brunak S. Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. Nucleic Acids Res. 1992 May 25;20(10):2511–2516. [Europe PMC free article] [Abstract] [Google Scholar]
  • Hayes WS, Borodovsky M. Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction. Pac Symp Biocomput. 1998:279–290. [Abstract] [Google Scholar]
  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. [Europe PMC free article] [Abstract] [Google Scholar]
  • Borodovsky M, McIninch JD, Koonin EV, Rudd KE, Médigue C, Danchin A. Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 1995 Sep 11;23(17):3554–3562. [Europe PMC free article] [Abstract] [Google Scholar]
  • Sacerdot C, Dessen P, Hershey JW, Plumbridge JA, Grunberg-Manago M. Sequence of the initiation factor IF2 gene: unusual protein features and homologies with elongation factors. Proc Natl Acad Sci U S A. 1984 Dec;81(24):7787–7791. [Europe PMC free article] [Abstract] [Google Scholar]
  • Missiakas D, Georgopoulos C, Raina S. The Escherichia coli heat shock gene htpY: mutational analysis, cloning, sequencing, and transcriptional regulation. J Bacteriol. 1993 May;175(9):2613–2624. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Citations & impact 


Impact metrics

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/3680609
Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/3680609

Article citations


Go to all (909) article citations

Other citations

Data 


Data behind the article

This data has been text mined from the article, or deposited into data resources.