gsnap - Genomic Short-read Nucleotide Alignment Program


gsnap -dDB [OPTION]... [QUERY]...


Align the sequences QUERY to the reference DB. With no QUERY, read standard input.


Input options

-D, --dir=directory
Genome directory
-d, --db=STRING
Genome database
-q, --part=INT/INT
Process only the i-th out of every n sequences e.g., 0/100 or 99/100
-c, --circular-input
Circular-end data (paired reads are on same strand)

Computation options

Note: GSNAP has an ultrafast algorithm for calculating mismatches up to and including ((readlength+2)/12 - 2) ("ultrafast mismatches"). The program will run fastest if max-mismatches (plus suboptimal-levels) is within that value. Also, indels, especially end indels, take longer to compute, although the algorithm is still designed to be fast.

-B, --batch=INT
Batch mode (0 = no pre-loading, 1 = pre-load only indices; 2 (default) = pre-load both indices and genome)
-m, --max-mismatches=FLOAT
Maximum number of mismatches allowed (if not specified, then defaults to the ultrafast level of ((readlength+2)/12 - 2)) If specified between 0.0 and 1.0, then treated as a fraction of each read length. Otherwise, treated as an integral number of mismatches (including indel and splicing penalties)
-i, --indel-penalty=INT
Penalty for an indel (default 1000, essentially turning it off). Counts against mismatches allowed. To find indels, make indel-penalty less than or equal to max-mismatches For 2-base reads, need to set indel-penalty somewhat high
-I, --indel-endlength=INT
Minimum length at end required for indel alignments (default 3)
-y, --max-middle-insertions=INT
Maximum number of middle insertions allowed (default 9)
-z, --max-middle-deletions=INT
Maximum number of middle deletions allowed (default 30)
-Y, --max-end-insertions=INT
Maximum number of end insertions allowed (default 3)
-Y, --max-end-deletions=INT
Maximum number of end deletions allowed (default 6)
-M, --suboptimal-score=INT
Report suboptimal hits beyond best hit (default 0) All hits with best score plus suboptimal-score are reported
-R, --masking=INT
Masking of frequent/repetitive oligomers to avoid spending time on non-unique or repetitive reads
 0 = no masking (will try to find non-unique or repetitive matches)
 1 = mask frequent oligomers
 2 = mask frequent and repetitive oligomers (fastest) (default)
 3 = greedy frequent: mask frequent oligomers first, then try no masking if alignments not found
 4 = greedy repetitive: mask frequent and repetitive oligomers first, then try no masking if alignments not found
-T, --trim=INT
Trim mismatches at ends (0 = no (default), 1 = yes)
-2, --dibase
Input is 2-base encoded (e.g., SOLiD), with database built previously using dibaseindex)
-C, --cmet
Use database for methylcytosine experiments, built previously using cmetindex)
-V, --usesnps=STRING
Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for tolerance to SNPs
-g, --geneprob=STRING
Use IIT file containing geneprob (in <STRING>.iit, of cumulative format >(count) (genomicpos) to resolve ties
-t, --nthreads=INT
Number of worker threads

Splicing options for RNA-Seq

-s, --splicesites=STRING
Look for splicing involving known splice sites (in <STRING>.iit), at short or long distances
-N, --novelsplicing=INT
Look for novel splicing, not in known splice sites (if -s provided) within shortsplicedist (-w flag) or with novelspliceprob (-x flag)
-w, --localsplicedist=INT
Definition of local novel splicing event (default 200000)
-e, --local-splice-penalty=INT
Penalty for a local splice (default 2). Counts against mismatches allowed
-E, --distant-splice-penalty=INT
Penalty for a distant splice (default 3). Counts against mismatches allowed
-k, --local-splice-endlength=INT
Minimum length at end required for local spliced alignments (default 15, min is 14)
-K, --distant-splice-endlength=INT
Minimum length at end required for distant spliced alignments (default 16, min is 14)
-J, --distant-splice-identity=FLOAT
Minimum identity at end required for distant spliced alignments (default 0.95)

Options for paired-end reads

-P, --pairmax=INT
Max total genomic length for paired reads (default 1000). Should increase for RNA-Seq reads.
-p, --pairlength=INT
Expected paired-end length (default 200)

Output options

-n, --npaths=INT
Maximum number of paths to print (default 100).
-Q, --quiet-if-excessive
If more than maximum number of paths are found, then nothing is printed.
-O, --ordered
Print output in same order as input (relevant only if there is more than one worker thread)
-S, --print-snps=INT
Print detailed information about SNPs in reads (works only if -V also selected) (0=no (default), 1=positions and labels)
-F, --failsonly
Print only failed alignments, those with no results
-f, --nofails
Exclude printing of failed alignments
-A, --format=STRING
Another format type, other than default. Currently implemented: sam

Help options

-v, --version
Show version
-?, --help
Show this help message


genome directory (eqivalent to -D)


configuration file


Thomas D. Wu and Colin K. Watanabe


Report bugs to Thomas Wu <>. Copyright 2005 Genentech, Inc. All rights reserved.


gmap_setup(1), gmap(1)
