ssake - assembling millions of very short DNA sequences


Progressive assembly of millions of short DNA sequences by k-mer search through a prefix tree and 3' extension.


Fasta file containing all the [paired (-p 1) / unpaired (-p 0)] reads (required) paired reads must now be separated by ":"
Fasta file containing sequences to use as seeds exclusively (specify only if different from read set, optional)
Minimum number of overlapping bases with the seed/contig during overhang consensus build up (default -m 16)
Minimum number of reads needed to call a base during an extension (default -o 3)
Minimum base ratio used to accept a overhang consensus base (default -r 0.7)
Trim up to -t base(s) on the contig end when all possibilities have been exhausted for an extension (default -t 0)>
Paired-end reads used? (-p 1=yes, -p 0=no, default -p 0)
Runs in verbose mode (-v 1=yes, -v 0=no, default -v 0, optional)
Base name for your output files (optional)

============ Options below only considered with -p 1 ============

Mean distance expected/observed between paired-end reads (default -d 200, optional)
Error (%) allowed on mean distance e.g. -e 0.75 == distance +/- 75% (default -e 0.75, optional)
Minimum number of links (read pairs) to compute scaffold (default -k 2, optional)
Maximum link ratio between two best contig pairs *higher values lead to least accurate scaffolding* (default -a 0.70, optional)
Minimum contig size to track paired-end reads (default -z 50, optional)
Fasta file containing unpaired sequence reads (optional)


