Rechercher une page de manuel
estim_pm
Langue: en
Version: 111136 (mandriva - 01/05/08)
Section: 1 (Commandes utilisateur)
NAME
estim_pm - Parsimonious Markov model estimation tool.SYNOPSIS
estim_pm arguments [options]DESCRIPTION
estim_m performs Parcimonious Markov model estimation and statistics calculus. The model is estimated on input sequence(s). The stationary law is also computed. The resulting model can then be used to simulate sequences with the simul_m program.ARGUMENTS
- sequence_file
- Either the name of a file containing a set of sequences in FASTA format, or the name of a file containing a list of filenames, each of which containing a set of sequences in FASTA format.
- -d --order=INTEGER
- Order of the Markov model.
OPTIONS
- -p --phase=INTEGER
- Number of phases (default = 1).
- -a --alphabet=FILENAME
- A file describing the alphabet to use (DNA alphabet, default setting).
- -A --Alphabet=EXPRESSION
- An expression describing the alphabet to use: [number<10 of characters for each pattern]+[:]+[alphabet patterns list] (DNA alphabet, default setting).
- --dna
- Use DNA alphabet (1:AGCT, default setting).
- --protein
- Use amino acid alphabet (1:IVLFCMAGTWSYPHEQDNKR).
- -o --output=FILENAME
- Result file containing the parameters of the estimated Parcimonious Markov model.
- --partition=FILENAME
- A file describing the partitions of the alphabet to use (all partitions, default setting)
- -b FLOAT
- Bayesian prior hyperparameter (1./alphabet_size, default setting).
- --penality FLOAT
- Penality on the leaves number (0, default setting).
- --oxml FILENAME
- Tree-shape results file, in xml format, only if ./configure --enable-xml done.
- -l --likelihood=FILENAME
- Compute the likelihood under selected model on the sequences contained in FILENAME or on the sequences whose filenames are listed in FILENAME.
- -L --Likelihood
- Compute the likelihood under selected model on the sequences specified by the sequence_file argument.
- -b --bic=FILENAME
- Compute the BIC under selected model on the sequences contained in FILENAME or on the sequences whose filenames are listed in FILENAME.
- -B --Bic=FILENAME
- Compute the BIC under selected model on the sequences specified by the sequence_file argument.
- --all
- Compute the total BIC/likelihood for all the given sequences.
- -v --version
- Display the version number and exit.
- -h --help
- Print this help and exit.
Examples
Estimate a parsimonious Markov model of order 5 on the list of sequence files contained in file seq.list. The sequences contain tokens of an alphabet described in file sample.alpha. Generate the estimated model in file model.desc.
- estim_pm seql.list -d 5 -a sample.alpha -o model.desc
Estimate a parsimonious Markov model of order 3 on the list of sequences contained in seq.faa. The sequences contain tokens of the amino-acids alphabet. rot.part is the partition file (see next section). Generate the estimated model xml description in file model.xml.
- estim_pm seq.faa -d 3 --partition prot.part --protein --oxml model.xml
Partition
Let a partition of an alphabet be a set of tokens'subset, i.e. a division of the alphabet into subset. The -partition option gives 2 possibilities:
* to compute the overall set of possible partitions (automatically generated) given the alphabet (default setting).
* to compute the overall set of possible partitions (automatically generated) given a synonymous pseud-alphabet: by declaring synonymous tokens, it is possible to group tokens as a single predictor so that the number of partitions is lower. In this case, a configuration file with the top key word "#Synonymous", containing the lists of synonymous tokens, is required.
Exple:
#Synonymous
a t
g c
* to input a selected set of partitions. In this case, in a configuration file after a "#Partition" on the first line, each partition is represented as a list of tokens'subset delimited by a "|", each subset being composed with tokens of the alphabet separed by space.
Exple(dna alphabet):
#Partition
a | g | c | t
a g | c | t
a c t | g
Exple2 (protein alphabet):
#Synonymous
A G
V L I
M
P F
W Y
D E
K R H
N Q C
S T
On large alphabets or orders, the set of possible partitions should be restricted to limit computation time.
AUTHORS
estim_pm is part of the seq++ package, developed by Vincent Miele <miele@genopole.cnrs.fr>, David Robelin <robelin@genopole.cnrs.fr>, Pierre-Yves Bourguignon <bourguignon@genopole.cnrs.fr>, Gregory Nuel <nuel@genopole.cnrs.fr> and Hugues Richard <richard@genopole.cnrs.fr>.SEE ALSO
estim_m(1), estim_mtd(1), estim_vlm(1), simul_m(1), dist_m(1)More information on seq++ is available at <http://stat.genopole.cnrs.fr/seqpp>.
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre