Alignment-free comparison of protein sequences based on reduced amino acid alphabets

J Biomol Struct Dyn. 2009 Jun;26(6):763-9. doi: 10.1080/07391102.2009.10507288.

Abstract

Protein sequences are treated as stochastic processes on the basis of a reduced amino acid alphabet of 10 types of amino acids. The realization of a stochastic process is described by associated transition probability matrix that corresponds to the process uniquely. Then new distances between transition probability matrices are defined for sequences similarity analysis. Two separate datasets are prepared and tested to identify the validity of the method. The results demonstrate the new method is powerful and efficient.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Amino Acids / analysis*
  • Amino Acids / genetics
  • Animals
  • Humans
  • Membrane Glycoproteins / analysis
  • Membrane Glycoproteins / classification
  • Membrane Glycoproteins / genetics
  • Molecular Sequence Data
  • Phylogeny
  • Proteins / analysis*
  • Proteins / classification
  • Proteins / genetics
  • Reproducibility of Results
  • Sequence Alignment / methods*
  • Spike Glycoprotein, Coronavirus
  • Transferrin / analysis
  • Transferrin / classification
  • Transferrin / genetics
  • Viral Envelope Proteins / analysis
  • Viral Envelope Proteins / classification
  • Viral Envelope Proteins / genetics

Substances

  • Amino Acids
  • Membrane Glycoproteins
  • Proteins
  • Spike Glycoprotein, Coronavirus
  • Transferrin
  • Viral Envelope Proteins