GMAP: A Genomic Mapping and Alignment Program for mRNA and EST
Sequences, and
GSNAP: Genomic Short-read Nucleotide Alignment Program
Links are provided below in parentheses for users who wish to
download the files with a command-line tool, like wget.
Source code for both GMAP and GSNAP
- Version 2024-10-20
(http://research-pub.gene.com/gmap/src/gmap-gsnap-2024-10-20.tar.gz).
Changes since 2024-10-10:
- Eliminated a long-standing memory leak
- Fixed a fatal bug in GSNAP relating to long segments being explored with localdb
- Fixed issues with coordinates in alignments to circular chromosomes
Changes since 2024-09-18:
- Fixes to fatal bugs in GSNAP when dealing with alignments at the beginning of the genome
- Reusing memory in GSNAP, which helps speed up performance
- Further improvements in splicing in GSNAP on the insides of paired-end reads
- Improvements to read localization for transcriptome-guided alignment
Changes since 2024-08-20:
- Fixes to fatal bugs in GSNAP when dealing with small chromosomes
- Improvements in splicing in GSNAP, especially on the insides of paired-end reads
- Improvements to the detection of indels in GSNAP. Using the values of --max-insertions and --max-deletions
- Restored the option --pairdev in GSNAP, and using that with --pairexpect in paired-end alignment
- Added the option --align-fraction to GMAP and GSNAP, which aligns only a fraction of the given reads, selected randomly
Changes since 2024-08-14:
- Improvements and fixes made in --two-pass mode for GSNAP
- For transcriptomes, gmap_build now puts link files under the transcriptome db, rather than the genome db
- The flag -C can now be used in gmap_build to specify a location for the transcriptome db
Changes since 2024-06-24:
- For GSNAP, implemented --splices-dump to produce a splice junctions file that matches that of STAR
- For GSNAP, also implemented --splices-read, --splices-noeval, and --splices-include-known
- Improvements in splicing in --two-pass mode in GSNAP
- Improvements in transcriptome-guided genomic alignment in GSNAP
- For SAM output of GSNAP, changed name of XM field to be MC for mate cigar
- Improved choice between insertion and deletions at a given position to consider the indel length
- For overlapping paired-end alignments, checking that the overlapping nucleotide sequences match
- Fixed gff3_genes to work on NCBI gff3 files as well as Ensembl
Changes since 2024-05-20:
- Improvements to splice identification in GSNAP
- Splice identification looking specifically for AT-AC as well as GT-AG and GC-AG introns
Changes since 2024-05-07:
- Extensive revisions in GSNAP to improve the accuracy of alignment
Changes since 2024-03-15:
- Fixed a fatal bug from trimming single-exon alignments near chromosomal bounds
- Removing duplicate transcripts in transcriptome-guided genomic alignment
Changes since 2024-02-02:
- Improvements in finding long indels
- Improvements in alignments of overlapping paired-end reads
- Improvements in transcriptome-guided genomic alignment
- Restored sub: field in standard GSNAP output
Changes since 2023-12-01:
- Fixed fatal bug on highly repetitive reads
- Rewrite to improve accuracy of read alignment
- Improvements to splice calling and to resolution of inner splicing
Changes since 2023-10-10:
- Major rewrite of GSNAP to improve speed
Changes since 2023-10-10:
- Fixed a bug in GSNAP allowing alignments to extend past the beginning or end of a chromosome
- Fixed a bug in SAM output of GSNAP resulting in non-ASCII characters to be generated
- Fixed a bug in SAM output of GMAP resulting in cigar length to be different from the querylength
Changes since 2023-10-01:
- Fixed a bug in SAM output of GSNAP involving hard-clipping
Changes since 2023-07-20:
- Restored non-SIMD versions of programs
- Fixed bugs in GMAP coming from alignments beyond the end of a chromosome
- Implemented SIMD code for approximate intersections
- In SAM output, MD strings now report N's in the query sequence as mismatches
- Multiple improvements in speed and accuracy for transcriptome-guided genomic alignment (TGGA)
- For SAM output from TGGA, distinguishing between short and long cryptic splice sites
Changes since 2023-06-01:
- Fixed bug in GSNAP from aligning across different chromosomes
- In GSNAP, evaluating whether poly-A regions should be considered in alignment
- In GSNAP, improved the ability to find fusion alignments
- In GSNAP, improved outputs in transcriptome-guided genomic alignment
Changes since 2023-04-28:
- Improved transcriptome-guided genomic alignment
significantly. Providing velocity assignments in GSNAP SAM
output. Extending splices based on transcripts
- Handling repetitive regions in queries by ignoring them
during alignment
- Allowing splicing within circular chromosomes
Changes since 2023-04-20:
- Improved algorithms in GSNAP for finding fusion alignments
- Extensively tested the stability of the GSNAP program
- Fixed issues with excessive memory usage in GSNAP,
especially on repetitive reads
Changes since 2023-04-12:
- Fixed issues introduced in 2023-04-12 where GSNAP missed
finding some concordant alignments for paired-end reads
- Fixed reporting of fusion endpoints and MD strings in
GSNAP SAM output, and fusion alignments in GSNAP standard output
- Fixed reporting of intronic alignments in
transcriptome-guided genomic alignment
Changes since 2023-03-24:
- Alignments to gene fusions are now reported by GSNAP
- Restored --min-coverage option to GSNAP and applying it with a default of 0.5
- Fixed issue where GSNAP would report insufficient memory on genomes of less than 65536 bp
Changes since 2023-02-17:
- Compilation now works on AVX-512 machines
- Improvements to speed of GSNAP
Changes since 2021-12-17:
- Complete rewrite of GSNAP, making it much faster and more accurate
- Compiles and runs on both Intel and Apple ARM (M1/M2) computers
- GSNAP now has a --two-pass mode, that allows it to learn splice sites, indels, sequence quality, and insert lengths
- Minor improvements to GMAP, prompted by user suggestions
- Uses a new genome/transcriptome index format, so it requires re-running gmap_build
- Some features are not yet supported or are not tested thoroughly
- More details to come
- Previously released versions
Release notices and bug reports
The old listserver mailing list is no longer being supported by EBI.
The new mailing list for issues relating to both GMAP and GSNAP is now at
Google Groups. To sign up, go to
your list,
select "All groups and messages", and search for "gsnap-users".
Click on that group, and then "Join group".
You can join there to receive release notices, ask questions, or see previous
messages. If you have a bug to report or a feature to request,
I believe you can also send email to [email protected]. (You may have to
subscribe to the list first, though, or the message will be held
for me to approve) or directly to me
at Thomas Wu ([email protected]).
Genome databases
You can build your own genome database with the gmap_build
program included with this software. Please see instructions in
the README file. For the human genome, you may want to retrieve
Documentation
README file from the source
code distribution. Contains basic usage information.
Software demonstration given at ISMB 2005. Contains various
examples of GMAP usage. [Slides]
References:
- Thomas D. Wu and Colin K. Watanabe
- GMAP: a genomic mapping and alignment program for mRNA and EST sequences
Bioinformatics 2005 21:1859-1875
[Abstract]
[Full Text]
- Thomas D. Wu and Serban Nacu
- Fast and SNP-tolerant detection of complex variants and splicing in short reads
Bioinformatics 2010 26:873-881
[Abstract]
[Full Text]
Supplementary information for Bioinformatics 2005 publication on GMAP:
Thomas Wu
Last modified: Tue Aug 16 07:49:00 PDT 2011