Rechercher une page de manuel
gp_cdndev
Langue: en
Version: 111677 (mandriva - 01/05/08)
Section: 1 (Commandes utilisateur)
NAME
gp_cdndev - calculate the codon bias of sequence(s)SYNOPSIS
gp_cdndev [options] <codon usage file> [inputfile] [outputfile]OPTIONS
- -o
- Show bias for all ORFs read
- -t
- Show the total bias for the set of ORFs read
- -b
- Both of the above
- -c file
- Read the alternate genetic code from
- file
- -v
- Prints the version information.
- -d
- Prints lots of debugging information.
- -h
- Shows usage information.
- codon usage file
- file containing a certain codon usage distribution
- inputfile
- file to proces; if not given, will use standard input
- outputfile
- file to write the data to; if not given, will use standard output
DESCRIPTION
Codon usage is related to the levels of protein expression. It is possible to predict expression of an ORF or a set of ORFs by comparing codon usage of those ORFs to the codon usage of genes with known levels of protein expression. There are different methods to measure this codon bias; the one used by gp_cdndev is described by S. Karlin and J. Mrazek, 2000, J. Bact. 182:5328-5350. Basically, it is a sum of absolute differences in the codon frequencies of the reference and input set, weighed by the frequencies of respective amino acids. Refer to the above paper for more details.
First you will have to record the codon usage of your reference sequence set, for example using the gp_cusage(1) program.
The format of the codon table is following: each line contains a codon and the frequency of it (trailing information in each line is skipped), empty lines and lines starting with a hash ('#') are ignored. The file does not have to contain information about all codons: it is enough to specify the codons that have frequency greater then 0.0. Here is an example:
# comment -- this line will be ignored GCC 1.35 # the codon GCC has the frequency of 1.35
Note that gp_cdndev does not check whether the sequence is a valid ORF or not. It just takes three nucleotides, checks what they code, and records the respective codon frequency.
EXAMPLES
1. Highly expressed ribosomal sequences are stored in the file ribo.fasta ; some unknown ORFs are stored in mystery.fasta. Now you'd like to calculate the codon bias of sequences in mystery.fasta in respect to the set of ribosomal sequences. Here is how you do it:
a. Calculate the codon usage of ribosomal sequences and write it to a file called ribo.cdu:
gp_cusage ribo.fasta ribo.cdu
or
gp_cusage ribo.fasta > ribo.cdu
b. Calculate the bias of mystery.fasta:
gp_cdndev ribo.cdu mystery.fasta
2. Just like the example above, but you'd like to know the total bias of the set of sequences stored in mystery.fasta:
gp_cusage ribo.fasta > ribo.cdu
gp_cdndev -t ribo.cdu mystery.fasta
SEE ALSO
Genpak(1) gp_acc(1) gp_adjust(1) gp_cusage(1) gp_digest(1) gp_dimer(1) gp_findorf(1) gp_gc(1) gp_getseq(1) gp_map(1) gp_matrix(1) gp_mkmtx(1) gp_pattern(1) gp_primer(1) gp_qs(1) gp_randseq(1) gp_seq2prot(1) gp_slen(1) gp_tm(1) gp_trimer(1)DIAGNOSTICS
All Genpak programs complain in situations you would also complain, like when they cannot find a sequence you gave them or the sequence is not valid.
The Genpak programs do not write over existing files. I have found this feature very useful :-)
BUGS
I'm sure there are plenty left, so please mail me if you find them. I tried to clean up every bug I could find.
AUTHOR
January Weiner III <january@bioinformatics.org>
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre