Authors:
Jader Garbelini
1
;
Danilo Sanches
2
;
André Kashiwabara
2
and
Aurora Pozo
1
Affiliations:
1
Federal University of Paraná, Curitiba, Brazil
;
2
Federal University of Technology, Cornélio Procópio, Brazil
Keyword(s):
Kmers, Motifs, Sequence Analysis, Optimization.
Abstract:
Motivation: Finding conserved motifs in DNA sequences is a key problem in bioinformatics. The growing availability of large-scale genomic data poses significant challenges for computational biology, particularly in terms of efficiency in analysis, kmer identification, and noise presence. The detection of conserved motifs and patterns in DNA sequences is determinant for understanding gene functions and regulations. Therefore, it is essential to develop a novel approaches and methods that can handle these large volumes of information and provide accurate and fast results. Results: We present SMT, an innovative tool designed to efficiently store and count kmers, optimizing memory usage and computation time. The application of SMT has also proven effective in discovering motifs in CHIP-SEQ data, allowing the identification of conserved regions in sequences. Furthermore, SMT allows exact searches in constant time proportional to the size of k and retrieves the most abundant kmers through
a frequency table. This approach facilitates large-scale data analysis and provides important insights into the conserved properties of biological sequences. The application of SMT in motif discovery demonstrates its potential to drive research in bioinformatics and genomics. Availability and implementation: Supplementary data and results are available to provide additional information and support the conclusions. SMT and source code can be found at the following address: https://github.com/jadermcg/smt.
(More)