Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


The FASTA package of sequence comparison programs has been expanded to include FASTX and FASTY, which compare a DNA sequence to a protein sequence database, translating the DNA sequence in three frames and aligning the translated DNA sequence to each sequence in the protein database, allowing gaps and frameshifts. Also new are TFASTX and TFASTY, which compare a protein sequence to a DNA sequence database, translating each sequence in the DNA database in six frames and scoring alignments with gaps and frameshifts. FASTX and TFASTX allow only frameshifts between codons, while FASTY and TFASTY allow substitutions or frameshifts within a codon. We examined the performance of FASTX and FASTY using different gap-opening, gap-extension, frameshift, and nucleotide substitution penalties. In general, FASTX and FASTY perform equivalently when query sequences contain 0-10% errors. We also evaluated the statistical estimates reported by FASTX and FASTY. These estimates are quite accurate, except when an out-of-frame translation produces a low-complexity protein sequence. We used FASTX to scan the Mycoplasma genitalium, Haemophilus influenzae, and Methanococcus jannaschii genomes for unidentified or misidentified protein-coding genes. We found at least 9 new protein-coding genes in the three genomes and at least 35 genes with potentially incorrect boundaries.

References 


Articles referenced by this article (22)


Show 10 more references (10 of 22)

Citations & impact 


Impact metrics

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/41631636
Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/41631636

Article citations


Go to all (365) article citations

Other citations

Similar Articles 


Funding 


Funders who supported this work.

NLM NIH HHS (2)