TFBS::Matrix::PFM.3pm

Langue: en

Autres versions - même langue

Version: 2008-01-24 (ubuntu - 24/10/10)

Section: 3 (Bibliothèques de fonctions)

NAME

TFBS::Matrix::PFM - class for raw position frequency matrix patterns

SYNOPSIS

creating a TFBS::Matrix::PFM object manually:
     my $matrixref = [ [ 12,  3,  0,  0,  4,  0 ],
                       [  0,  0,  0, 11,  7,  0 ],
                       [  0,  9, 12,  0,  0,  0 ],
                       [  0,  0,  0,  1,  1, 12 ]
                     ];  
     my $pfm = TFBS::Matrix::PFM->new(-matrix => $matrixref,
                                      -name   => "MyProfile",
                                      -ID     => "M0001"
                                     );
     # or
  
     my $matrixstring =
         "12 3 0 0 4 0\n0 0 0 11 7 0\n0 9 12 0 0 0\n0 0 0 1 1 12";
  
     my $pfm = TFBS::Matrix::PFM->new(-matrixstring => $matrixstring,
                                      -name         => "MyProfile",
                                      -ID           => "M0001"
                                     );
 
 
retrieving a TFBS::Matix::PFM object from a database:

(See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.)

     my $db_obj = TFBS::DB::JASPAR2->new
                     (-connect => ["dbi:mysql:JASPAR2:myhost",
                                   "myusername", "mypassword"]);
     my $pfm = $db_obj->get_Matrix_by_ID("M0001", "PFM");
     # or
     my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "PFM");
 
 
retrieving list of individual TFBS::Matrix::PFM objects from a TFBS::MatrixSet object

(See the TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices.)

     my @pfm_list = $matrixset->all_patterns(-sort_by=>"name");
 
 
convert a raw frequency matrix to other matrix types:
     my $pwm = $pfm->to_PWM(); # convert to position weight matrix
     my $icm = $icm->to_ICM(); # convert to information con
 
 

DESCRIPTION

TFBS::Matrix::PFM is a class whose instances are objects representing raw position frequency matrices (PFMs). A PFM is derived from N nucleotide patterns of fixed size, e.g. the set of sequences
     AGGCCT
     AAGCCT
     AGGCAT
     AAGCCT
     AAGCCT
     AGGCAT
     AGGCCT
     AGGCAT
     AGGTTT
     AGGCAT
     AGGCCT
     AGGCCT
 
 

will give the matrix:

     A:[ 12  3  0  0  4  0 ]
     C:[  0  0  0 11  7  0 ]
     G:[  0  9 12  0  0  0 ]
     T:[  0  0  0  1  1 12 ]
 
 

which contains the count of each nucleotide at each position in the sequence. (If you have a set of sequences as above and want to create a TFBS::Matrix::PFM object out of them, have a look at TFBS::PatternGen::SimplePFM module.)

PFMs are easily converted to other types of matrices, namely information content matrices and position weight matrices. A TFBS::Matrix::PFM object has the methods to_ICM and to_PWM which do just that, returning a TFBS::Matrix::ICM and TFBS::Matrix::PWM objects, respectively.

FEEDBACK

Please send bug reports and other comments to the author.

AUTHOR - Boris Lenhard

Boris Lenhard <Boris.Lenhard@cgb.ki.se>

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore.

new

  Title   : new
  Usage   : my $pfm = TFBS::Matrix::PFM->new(%args)
  Function: constructor for the TFBS::Matrix::PFM object
  Returns : a new TFBS::Matrix::PFM object
  Args    : # you must specify either one of the following three:
  
            -matrix,      # reference to an array of arrays of integers
               #or
            -matrixstring,# a string containing four lines
                          # of tab- or space-delimited integers
               #or
            -matrixfile,  # the name of a file containing four lines
                          # of tab- or space-delimited integers
            #######
  
            -name,        # string, OPTIONAL
            -ID,          # string, OPTIONAL
            -class,       # string, OPTIONAL
            -tags         # an array reference, OPTIONAL
 Warnings  : Warns if the matrix provided has columns with different
             sums. Columns with different sums contradict the usual
             origin of matrix data and, unless you are absolutely sure
             that column sums _should_ be different, it would be wise to
             check your matrices.
 
 

column_sum

  Title   : column_sum
  Usage   : my $nr_sequences = $pfm->column_sum()
  Function: calculates the sum of elements of one column
            (the first one by default) which normally equals the
            number of sequences used to derive the PFM. 
  Returns : the sum of elements of one column (an integer)
  Args    : columnn number (starting from 1), OPTIONAL - you DO NOT
            need to specify it unless you are dealing with a matrix
 
 

to_PWM

  Title   : to_PWM
  Usage   : my $pwm = $pfm->to_PWM()
  Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object)
            to position weight matrix. At present it assumes uniform
            background distribution of nucleotide frequencies.
  Returns : a new TFBS::Matrix::PWM object
  Args    : none; in the future releases, it should be able to accept
            a user defined background probability of the four
            nucleotides
 
 

to_ICM

  Title   : to_ICM
  Usage   : my $icm = $pfm->to_ICM()
  Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object)
            to information content matrix. At present it assumes uniform
            background distribution of nucleotide frequencies.
  Returns : a new TFBS::Matrix::ICM object
  Args    : -small_sample_correction # undef (default), 'schneider' or 'pseudocounts'
 
 

How a PFM is converted to ICM:

For a PFM element PFM[i,k], the probability without pseudocounts is estimated to be simply

   p[i,k] = PFM[i,k] / Z
 
 

where - Z equals the column sum of the matrix i.e. the number of motifs used to construct the PFM. - i is the column index (position in the motif) - k is the row index (a letter in the alphacer, here k is one of (A,C,G,T)

Here is how one normally calculates the pseudocount-corrected positional probability p'[i,j]:

   p'[i,k] = (PFM[i,k] + 0.25*sqrt(Z)) / (Z + sqrt(Z))
 
 

0.25 is for the flat distribution of nucleotides, and sqrt(Z) is the recommended pseudocount weight. In the general case,

   p'[i,k] = (PFM[i,k] + q[k]*B) / (Z + B)
 
 

where q[k] is the background distribution of the letter (nucleotide) k, and B an arbitrary pseudocount value or expression (for no pseudocounts B=0).

For a given position i, the deviation from random distribution in bits is calculated as (Baldi and Brunak eq. 1.9 (2ed) or 1.8 (1ed)):

- for an arbitrary alphabet of A letters:

   D[i] = log2(A) + sum_for_all_k(p[i,k]*log2(p[i,k]))
 
 

- special case for nucleotides (A=4)

   D[i] = 2 + sum_for_all_k(p[i,k]*log2(p[i,k]))
 
 

D[i] equals the information content of the position i in the motif. To calculate the entire ICM, you have to calculate the contrubution of each nucleotide at a position i to D[i], i.e.

ICM[i,k] = p'[i,k] * D[i]

draw_logo

  Title   : draw_logo
  Usage   : my $gd_image = $pfm->draw_logo()
  Function: draws a sequence logo; similar to the 
            method in TFBS::Matrix::ICM, but can automatically calculate
            error bars for drawing
  Returns : a GD image object (see documentation of GD module)
  Args    : many; PFM-specific options are:
            -small_sample_correction # One of 
                                     # "Schneider" (uses correction 
                                     #   described by Schneider et al.
                                     #   (Schneider t et al. (1986) J.Biol.Chem.
                                     # "pseudocounts" - standard pseudocount 
                                     #   correction,  more suitable for 
                                     #   PFMs with large r column sums
                                     # If the parameter is ommited, small
                                     # sample correction is not applied
 
            -draw_error_bars         # if true, adds error bars to each position
                                     # in the logo. To calculate the error bars,
                                     # it uses the -small_sample_connection
                                     # argument if explicitly set,  
                                     # or "Schneider" by default
 For other args, see draw_logo entry in TFBS::Matrix::ICM documentation
 
 

add_PFM

  Title   : add_PFM
  Usage   : $pfm->add_PFM($another_pfm)
  Function: adds the values of $pnother_pfm matrix to $pfm
  Returns : reference to the updated $pfm object
  Args    : a TFBS::Matrix::PFM object
 
 

name

ID

class

matrix

length

revcom

rawprint

prettyprint

The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them.