A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.

Mehta PK; Heringa J; Argos P

doi:10.1002/pro.5560041208

Abstract

To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.

Free full text

Protein Sci. 1995 Dec; 4(12): 2517–2525.

https://doi.org/10.1002/pro.5560041208

PMCID: PMC2143048

PMID: 8580842

A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.

P. K. Mehta, J. Heringa, and P. Argos

Author information Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Abstract

To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.

Full Text

The Full Text of this article is available as a PDF (906K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Barton GJ, Newman RH, Freemont PS, Crumpton MJ. Amino acid sequence analysis of the annexin super-gene family of proteins. Eur J Biochem. 1991 Jun 15;198(3):749–760. [Abstract] [Google Scholar]
Bazan JF. Structural design and molecular evolution of a cytokine receptor superfamily. Proc Natl Acad Sci U S A. 1990 Sep;87(18):6934–6938. [Europe PMC free article] [Abstract] [Google Scholar]
Benner SA, Gerloff D. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv Enzyme Regul. 1991;31:121–181. [Abstract] [Google Scholar]
Bowie JU, Lüthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. [Abstract] [Google Scholar]
Chou PY, Fasman GD. Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry. 1974 Jan 15;13(2):211–222. [Abstract] [Google Scholar]
Colloc'h N, Etchebest C, Thoreau E, Henrissat B, Mornon JP. Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. Protein Eng. 1993 Jun;6(4):377–382. [Abstract] [Google Scholar]
Crawford IP, Niermann T, Kirschner K. Prediction of secondary structure by evolutionary comparison: application to the alpha subunit of tryptophan synthase. Proteins. 1987;2(2):118–129. [Abstract] [Google Scholar]
Devereux J, Haeberli P, Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. [Europe PMC free article] [Abstract] [Google Scholar]
Donnelly D, Overington JP, Blundell TL. The prediction and orientation of alpha-helices from sequence alignments: the combined use of environment-dependent substitution tables, Fourier transform methods and helix capping rules. Protein Eng. 1994 May;7(5):645–653. [Abstract] [Google Scholar]
Garnier J, Levin JM. The protein structure code: what is its present status? Comput Appl Biosci. 1991 Apr;7(2):133–142. [Abstract] [Google Scholar]
Garnier J, Osguthorpe DJ, Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol. 1978 Mar 25;120(1):97–120. [Abstract] [Google Scholar]
Gibrat JF, Garnier J, Robson B. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol. 1987 Dec 5;198(3):425–443. [Abstract] [Google Scholar]
Hirst JD, Sternberg MJ. Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry. 1992 Aug 18;31(32):7211–7218. [Abstract] [Google Scholar]
Holley LH, Karplus M. Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A. 1989 Jan;86(1):152–156. [Europe PMC free article] [Abstract] [Google Scholar]
Kneller DG, Cohen FE, Langridge R. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol. 1990 Jul 5;214(1):171–182. [Abstract] [Google Scholar]
Levin JM, Pascarella S, Argos P, Garnier J. Quantification of secondary structure prediction improvement using multiple alignments. Protein Eng. 1993 Nov;6(8):849–854. [Abstract] [Google Scholar]
Lim VI. Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. J Mol Biol. 1974 Oct 5;88(4):857–872. [Abstract] [Google Scholar]
Lüthy R, McLachlan AD, Eisenberg D. Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins. 1991;10(3):229–239. [Abstract] [Google Scholar]
Muggleton S, King RD, Sternberg MJ. Protein secondary structure prediction using logic-based machine learning. Protein Eng. 1992 Oct;5(7):647–657. [Abstract] [Google Scholar]
Overington J, Donnelly D, Johnson MS, Sali A, Blundell TL. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci. 1992 Feb;1(2):216–226. [Europe PMC free article] [Abstract] [Google Scholar]
Pascarella S, Argos P. A data bank merging related protein structures and sequences. Protein Eng. 1992 Mar;5(2):121–137. [Abstract] [Google Scholar]
Qian N, Sejnowski TJ. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol. 1988 Aug 20;202(4):865–884. [Abstract] [Google Scholar]
Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993 Jul 20;232(2):584–599. [Abstract] [Google Scholar]
Russell RB, Barton GJ. The limits of protein secondary structure prediction accuracy from multiple sequence alignment. J Mol Biol. 1993 Dec 20;234(4):951–957. [Abstract] [Google Scholar]
Russell RB, Breed J, Barton GJ. Conservation analysis and structure prediction of the SH2 family of phosphotyrosine binding domains. FEBS Lett. 1992 Jun 8;304(1):15–20. [Abstract] [Google Scholar]
Salamov AA, Solovyev VV. Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol. 1995 Mar 17;247(1):11–15. [Abstract] [Google Scholar]
Zhang X, Mesirov JP, Waltz DL. Hybrid system for protein secondary structure prediction. J Mol Biol. 1992 Jun 20;225(4):1049–1063. [Abstract] [Google Scholar]
Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol. 1987 Jun 20;195(4):957–961. [Abstract] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

Full text links

Read article at publisher's site: https://doi.org/10.1002/pro.5560041208

Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc2143048?pdf=render

Citations & impact

Impact metrics

37

Citations

Jump to Citations

Citations of article over time

Article citations

Characterization on the oncogenic effect of the missense mutations of p53 via machine learning.
Pan Q, Portelli S, Nguyen TB, Ascher DB
Brief Bioinform, 25(1):bbad428, 01 Nov 2023
Cited by: 1 article | PMID: 38018912 | PMCID: PMC10685404
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Propensities of Some Amino Acid Pairings in α-Helices Vary with Length.
Nacar C
Protein J, 41(6):551-562, 28 Sep 2022
Cited by: 0 articles | PMID: 36169766
Modeling aspects of the language of life through transfer-learning protein sequences.
Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B
BMC Bioinformatics, 20(1):723, 17 Dec 2019
Cited by: 180 articles | PMID: 31847804 | PMCID: PMC6918593
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Depth dependent amino acid substitution matrices and their use in predicting deleterious mutations.
Farheen N, Sen N, Nair S, Tan KP, Madhusudhan MS
Prog Biophys Mol Biol, 128:14-23, 15 Feb 2017
Cited by: 6 articles | PMID: 28212855
Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding.
Mishra A, Rao S, Mittal A, Jayaram B
Biochim Biophys Acta, 1834(8):1520-1531, 07 May 2013
Cited by: 10 articles | PMID: 23665455

Go to all (37) article citations

Search life-sciences literature (45,104,206 articles, preprints and more)

A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.

Affiliations

Authors

ORCIDs linked to this article

Abstract

Free full text

A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.

Abstract

Full Text

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Full text links

Citations & impact

Impact metrics

Citations of article over time

Article citations

Characterization on the oncogenic effect of the missense mutations of p53 via machine learning.

Propensities of Some Amino Acid Pairings in α-Helices Vary with Length.

Modeling aspects of the language of life through transfer-learning protein sequences.

Depth dependent amino acid substitution matrices and their use in predicting deleterious mutations.

Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding.

Similar Articles

Protein secondary structure prediction using local alignments.

Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures.

Improving protein secondary structure prediction with aligned homologous sequences.

[A turning point in the knowledge of the structure-function-activity relations of elastin].

Partnerships & funding

Similar Articles

Protein secondary structure prediction using local alignments.
J Mol Biol, 268(1):31-36, 01 Apr 1997

Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures.
J Mol Biol, 238(5):693-708, 01 May 1994

Improving protein secondary structure prediction with aligned homologous sequences.

[A turning point in the knowledge of the structure-function-activity relations of elastin].
J Soc Biol, 195(2):181-193, 01 Jan 2001