gp_cdndev

Langue: en

Version: 111677 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

gp_cdndev - calculate the codon bias of sequence(s)

SYNOPSIS

gp_cdndev [options] <codon usage file> [inputfile] [outputfile]

OPTIONS

-o
Show bias for all ORFs read
-t
Show the total bias for the set of ORFs read
-b
Both of the above
-c file
Read the alternate genetic code from
file
-v
Prints the version information.
-d
Prints lots of debugging information.
-h
Shows usage information.
codon usage file
file containing a certain codon usage distribution
inputfile
file to proces; if not given, will use standard input
outputfile
file to write the data to; if not given, will use standard output

DESCRIPTION

Codon usage is related to the levels of protein expression. It is possible to predict expression of an ORF or a set of ORFs by comparing codon usage of those ORFs to the codon usage of genes with known levels of protein expression. There are different methods to measure this codon bias; the one used by gp_cdndev is described by S. Karlin and J. Mrazek, 2000, J. Bact. 182:5328-5350. Basically, it is a sum of absolute differences in the codon frequencies of the reference and input set, weighed by the frequencies of respective amino acids. Refer to the above paper for more details.

First you will have to record the codon usage of your reference sequence set, for example using the gp_cusage(1) program.

The format of the codon table is following: each line contains a codon and the frequency of it (trailing information in each line is skipped), empty lines and lines starting with a hash ('#') are ignored. The file does not have to contain information about all codons: it is enough to specify the codons that have frequency greater then 0.0. Here is an example:

  
 
                         # comment -- this line will be ignored
                         GCC 1.35
                         # the codon GCC has the frequency of 1.35
                 
 

 

Note that gp_cdndev does not check whether the sequence is a valid ORF or not. It just takes three nucleotides, checks what they code, and records the respective codon frequency.

EXAMPLES

1. Highly expressed ribosomal sequences are stored in the file ribo.fasta ; some unknown ORFs are stored in mystery.fasta. Now you'd like to calculate the codon bias of sequences in mystery.fasta in respect to the set of ribosomal sequences. Here is how you do it:

a. Calculate the codon usage of ribosomal sequences and write it to a file called ribo.cdu:

gp_cusage ribo.fasta ribo.cdu

or

gp_cusage ribo.fasta > ribo.cdu

b. Calculate the bias of mystery.fasta:

gp_cdndev ribo.cdu mystery.fasta

2. Just like the example above, but you'd like to know the total bias of the set of sequences stored in mystery.fasta:

gp_cusage ribo.fasta > ribo.cdu

gp_cdndev -t ribo.cdu mystery.fasta

SEE ALSO

Genpak(1) gp_acc(1) gp_adjust(1) gp_cusage(1) gp_digest(1) gp_dimer(1) gp_findorf(1) gp_gc(1) gp_getseq(1) gp_map(1) gp_matrix(1) gp_mkmtx(1) gp_pattern(1) gp_primer(1) gp_qs(1) gp_randseq(1) gp_seq2prot(1) gp_slen(1) gp_tm(1) gp_trimer(1)

DIAGNOSTICS

All Genpak programs complain in situations you would also complain, like when they cannot find a sequence you gave them or the sequence is not valid.

The Genpak programs do not write over existing files. I have found this feature very useful :-)

BUGS

I'm sure there are plenty left, so please mail me if you find them. I tried to clean up every bug I could find.

AUTHOR

January Weiner III <january@bioinformatics.org>