Abstract
Motivation
Automatic extraction of motifs that occur frequently on a set of unaligned DNA sequences is useful for predicting the binding sites of unknown transcription factors. Several programs for this purpose have been released. However, in our opinion, they are not practical enough to be applied to a large number of upstream sequences.Results
We propose a new program called YEBIS (Yet another Environment for the analysis of BIopolymer Sequences) which is capable of extracting a set of motifs, without any a priori knowledge, from a number of functionally related DNA sequences. Using the hidden Markov model, these motifs are represented in a more general form than other conventional methods, such as the weight matrix method. When applied to several sets of benchmark data, it was found that YEBIS had comparable capability to the existing methods, but was much faster. Moreover, it could extract all known motifs from the LTR sequences (long terminal repeat sequences) in a single run. Finally, it could be successfully applied to approximately 400 human promoter sequences and some of the extracted motifs turned out to be known cis-elements. Therefore, YEBIS could be a practical tool for exploring the upstream sequences of genomic ORFs, some of which are regulated in a similar fashion.Availability
YEBIS will be distributed to academic users free of charge. All requests should be sent to the address below.Contact
E-MAIL: [email protected]Full text links
Read article at publisher's site: https://doi.org/10.1093/bioinformatics/14.4.317
Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/bioinformatics/article-pdf/14/4/317/9731892/140317.pdf
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1093/bioinformatics/14.4.317
Article citations
Segmentation of DNA using simple recurrent neural network.
Knowl Based Syst, 26:271-280, 17 Sep 2011
Cited by: 1 article | PMID: 32288315 | PMCID: PMC7126336
A genome-wide analysis of FRT-like sequences in the human genome.
PLoS One, 6(3):e18077, 23 Mar 2011
Cited by: 5 articles | PMID: 21448289 | PMCID: PMC3063242
A new data mining approach for the detection of bacterial promoters combining stochastic and combinatorial methods.
J Comput Biol, 16(9):1211-1225, 01 Sep 2009
Cited by: 2 articles | PMID: 19772433
KIRMES: kernel-based identification of regulatory modules in euchromatic sequences.
Bioinformatics, 25(16):2126-2133, 23 Apr 2009
Cited by: 7 articles | PMID: 19389732 | PMCID: PMC2722996
Discovering sequence motifs with arbitrary insertions and deletions.
PLoS Comput Biol, 4(4):e1000071, 09 May 2008
Cited by: 200 articles | PMID: 18437229 | PMCID: PMC2323616
Go to all (24) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.
J Comput Biol, 18(5):759-770, 01 May 2011
Cited by: 0 articles | PMID: 21554019
CAGER: classification analysis of gene expression regulation using multiple information sources.
BMC Bioinformatics, 6:114, 12 May 2005
Cited by: 6 articles | PMID: 15890068 | PMCID: PMC1174863
MAMOT: hidden Markov modeling tool.
Bioinformatics, 24(11):1399-1400, 25 Apr 2008
Cited by: 9 articles | PMID: 18440999
Modeling promoter grammars with evolving hidden Markov models.
Bioinformatics, 24(15):1669-1675, 05 Jun 2008
Cited by: 9 articles | PMID: 18535083