gp_adjust

Langue: en

Version: 111676 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

gp_adjust - print codon usage of sequence(s)

SYNOPSIS

gp_adjust [options] <codon usage file> [inputfile] [outputfile]

OPTIONS

-u file
Read the genetic code used by codon usage table and output sequences from file
-s file
Read the genetic code used by the input sequences from file
-v
Prints the version information.
-d
Prints lots of debugging information.
-h
Shows usage information.
codon usage file
file containing a certain codon usage distribution
inputfile
file to proces; if not given, will use standard input
outputfile
file to write the data to; if not given, will use standard output

DESCRIPTION

When cloning genes, it is sometimes necessary to adjust the codon usage of a certain gene to the codon usage of highly expressed genes in the organism we are transforming. gp_adjust looks at a table containing codon usage of a gene or set of genes, and then replaces the codons of the given sequence(s) by codons which have the highest frequency in the codon usage table.

The format of the codon table is following: each line contains a codon and the frequency of it (trailing information in each line is skipped), empty lines and lines starting with a hash ('#') are ignored. The file does not have to contain information about all codons: it is enough to specify the codons that have frequency greater then 0.0. Here is an example:

  
 
                         # comment -- this line will be ignored
                         GCC 1.35
                         # the codon GCC has the frequency of 1.35
                 
 

 

Such a codon usage table can be produced easily by the gp_cusage(1) program.

Note that gp_adjust does not check whether the sequence is a valid ORF or not. It just takes three nucleotides, checks what they code, and puts what it finds to have a higher frequency while preserving the function.

Warning: there is one caveat when you adapt input sequences from one organism to codon usage of a second organism which has a different genetic code. If you make a codon usage table using gp_cusage, and there is a certain codon which does not occur in this table, any occurence of this codon in the input sequence will not be replaced. It will remained as it was. And this might be a problem: if this particular codon has a

o
meaning in both organisms, then you will have a non-neutral mutation!

EXAMPLES

1. To adjust the codon usage of sequences stored in the file myseqs.fasta to the codon usage table stored highexpress.cdu, type

gp_adjust higexpress.cdu myseqs.fasta

2. You clone a gene from E. coli, which uses the standard genetic code, in M. pneumoniae, which uses the genetic code stored in a file called myco.cdn. You would like to optimize the gene from E.coli to adapt it to the codon usage of M. pneumoniae. The sequence of the E.coli gene is stored in file ecoli.fasta. Sequences of some highly expressed genes from M. pneumoniae are stored in file ribo_mp.fasta.

a. Produce the codon usage table for the M.pneumoniae genes, using the M.pneumoniae genetic code

gp_cusage -c myco.cdn ribo_mp.fasta > ribo_mp.cdu

b. Produce the altered gene sequence of your gene, adapted to the codon usage and genetic code of M. pneumoniae, and store it in a file called ecoli_mod.fasta:

gp_adjust -u myco.cdn ribo_mp.cdu ecoli.fasta > ecoli_mod.fasta c. Check whether the protein sequences of the old (bf(ecoli.fasta)) and new (bf(ecoli_mod.fasta)) are the same: tt(gp_seq2prot ecoli.fasta) tt(gp_seq2prot -c myco.cdn ecoli_mod.fasta) manpageseealso() url(Genpak(1))(index.html) url(gp_acc(1))(gp_acc.html) url(gp_cdndev(1))(gp_cdndev.html) url(gp_cusage(1))(gp_cusage.html) url(gp_digest(1))(gp_digest.html) url(gp_dimer(1))(gp_dimer.html) url(gp_findorf(1))(gp_findorf.html) url(gp_gc(1))(gp_gc.html) url(gp_getseq(1))(gp_getseq.html) url(gp_map(1))(gp_map.html) url(gp_matrix(1))(gp_matrix.html) url(gp_mkmtx(1))(gp_mkmtx.html) url(gp_pattern(1))(gp_pattern.html) url(gp_primer(1))(gp_primer.html) url(gp_qs(1))(gp_qs.html) url(gp_randseq(1))(gp_randseq.html) url(gp_seq2prot(1))(gp_seq2prot.html) url(gp_slen(1))(gp_slen.html) url(gp_tm(1))(gp_tm.html) url(gp_trimer(1))(gp_trimer.html) manpagediagnostics() All bf(Genpak) programs complain in situations you would also complain, like when they cannot find a sequence you gave them or the sequence is not valid. The bf(Genpak) programs do not write over existing files. I have found this feature very useful :-

BUGS

I'm sure there are plenty left, so please mail me if you find them. I tried to clean up every bug I could find.

AUTHOR

January Weiner III <january@bioinformatics.org>