TFBS::Matrix::PWM.3pm

Langue: en

Autres versions - même langue

Version: 2008-01-24 (ubuntu - 24/10/10)

Section: 3 (Bibliothèques de fonctions)

NAME

TFBS::Matrix::PWM - class for position weight matrices of nucleotide patterns

SYNOPSIS

creating a TFBS::Matrix::PWM object manually:
     my $matrixref = [ [ 0.61, -3.16,  1.83, -3.16,  1.21, -0.06],
                       [-0.15, -2.57, -3.16, -3.16, -2.57, -1.83],
                       [-1.57,  1.85, -2.57, -1.34, -1.57,  1.14],
                       [ 0.31, -3.16, -2.57,  1.76,  0.24, -0.83]
                     ];
     my $pwm = TFBS::Matrix::PWM->new(-matrix => $matrixref,
                                      -name   => "MyProfile",
                                      -ID     => "M0001"
                                     );
     # or
 
     my $matrixstring = <<ENDMATRIX
      0.61 -3.16  1.83 -3.16  1.21 -0.06
     -0.15 -2.57 -3.16 -3.16 -2.57 -1.83
     -1.57  1.85 -2.57 -1.34 -1.57  1.14
      0.31 -3.16 -2.57  1.76  0.24 -0.83
     ENDMATRIX
     ;
     my $pwm = TFBS::Matrix::PWM->new(-matrixstring => $matrixstring,
                                      -name         => "MyProfile",
                                      -ID           => "M0001"
                                     );
 
 
retrieving a TFBS::Matix::PWM object from a database:

(See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.)

     my $db_obj = TFBS::DB::JASPAR2->new
                     (-connect => ["dbi:mysql:JASPAR2:myhost",
                                   "myusername", "mypassword"]);
     my $pwm = $db_obj->get_Matrix_by_ID("M0001", "PWM");
     # or
     my $pwm = $db_obj->get_Matrix_by_name("MyProfile", "PWM");
 
 
retrieving list of individual TFBS::Matrix::PWM objects from a TFBS::MatrixSet object

(see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices)

     my @pwm_list = $matrixset->all_patterns(-sort_by=>"name");
 
 
scanning a nucleotide sequence with a matrix
     my $siteset = $pwm->search_seq(-file      =>"myseq.fa",
                                    -threshold => "80%");
 
 
scanning a pairwise alignment with a matrix
     my $site_pair_set = $pwm->search_aln(-file      =>"myalign.aln",
                                          -threshold => "80%",
                                          -cutoff    => "70%",
                                          -window    => 50);
 
 

DESCRIPTION

TFBS::Matrix::PWM is a class whose instances are objects representing position weight matrices (PWMs). A PWM is normally calculated from a raw position frequency matrix (see TFBS::Matrix::PFM for the explanation of position frequency matrices). For example, given the following position frequency matrix:
     A:[ 12     3     0     0     4     0  ]
     C:[  0     0     0    11     7     0  ]
     G:[  0     9    12     0     0     0  ]
     T:[  0     0     0     1     1    12  ]
 
 

The standard computational procedure is applied to convert it into the following position weight matrix:

     A:[ 0.61 -3.16  1.83 -3.16  1.21 -0.06]
     C:[-0.15 -2.57 -3.16 -3.16 -2.57 -1.83]
     G:[-1.57  1.85 -2.57 -1.34 -1.57  1.14]
     T:[ 0.31 -3.16 -2.57  1.76  0.24 -0.83]
 
 

which contains the ``weights'' associated with the occurence of each nucleotide at the given position in a pattern.

A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search).

FEEDBACK

Please send bug reports and other comments to the author.

AUTHOR - Boris Lenhard

Boris Lenhard <Boris.Lenhard@cgb.ki.se>

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore.

new

  Title   : new
  Usage   : my $pwm = TFBS::Matrix::PWM->new(%args)
  Function: constructor for the TFBS::Matrix::PWM object
  Returns : a new TFBS::Matrix::PWM object
  Args    : # you must specify either one of the following three:
 
            -matrix,      # reference to an array of arrays of integers
               #or
            -matrixstring,# a string containing four lines
                          # of tab- or space-delimited integers
               #or
            -matrixfile,  # the name of a file containing four lines
                          # of tab- or space-delimited integers
            #######
 
            -name,        # string, OPTIONAL
            -ID,          # string, OPTIONAL
            -class,       # string, OPTIONAL
            -tags         # an array reference, OPTIONAL
 
 

search_seq

  Title   : search_seq
  Usage   : my $siteset = $pwm->search_seq(%args)
  Function: scans a nucleotide sequence with the pattern represented
            by the PWM
  Returns : a TFBS::SiteSet object
  Args    : # you must specify either one of the following three:
 
            -file,       # the name od a fasta file (single sequence)
               #or
            -seqobj      # a Bio::Seq object
                         # (more accurately, a Bio::PrimarySeqobject or a
                         #  subclass thereof)
               #or
            -seqstring # a string containing the sequence
 
            -threshold,  # minimum score for the hit, either absolute
                         # (e.g. 11.2) or relative (e.g. "75%")
                         # OPTIONAL: default "80%"
 
            -subpart     # subpart of the sequence to search, given as
                         # -subpart => { start => 140,
                         #               end   => 180 }
                         # where start and end are coordinates in the
                         # sequence; the coordinate range is interpreted
                         # in the BioPerl tradition (1-based, inclusive)
                         # OPTIONAL: by default searches entire alignment
 
 

search_aln

  Title   : search_aln
  Usage   : my $site_pair_set = $pwm->search_aln(%args)
  Function: Scans a pairwise alignment of nucleotide sequences
            with the pattern represented by the PWM: it reports only
            those hits that are present in equivalent positions of both
            sequences and exceed a specified threshold score in both, AND
            are found in regions of the alignment above the specified
            conservation cutoff value.
  Returns : a TFBS::SitePairSet object
  Args    : # you must specify either one of the following three:
 
            -file,       # the name of the alignment file in Clustal
                                format
               #or
            -alignobj      # a Bio::SimpleAlign object
                         # (more accurately, a Bio::PrimarySeqobject or a
                         #  subclass thereof)
               #or
            -alignstring # a multi-line string containing the alignment
                         # in clustal format
            #############
 
            -threshold,  # minimum score for the hit, either absolute
                         # (e.g. 11.2) or relative (e.g. "75%")
                         # OPTIONAL: default "80%"
 
            -window,     # size of the sliding window (inn nucleotides)
                         # for calculating local conservation in the
                         # alignment
                         # OPTIONAL: default 50
 
            -cutoff      # conservation cutoff (%) for including the
                         # region in the results of the pattern search
                         # OPTIONAL: default "70%"
 
            -subpart     # subpart of the alignment to search, given as e.g.
                         # -subpart => { relative_to => 1,
                         #               start       => 140,
                         #               end         => 180 }
                         # where start and end are coordinates in the
                         # sequence indicated by relative_to (1 for the
                         # 1st sequence in the alignment, 2 for the 2nd)
                         # OPTIONAL: by default searches entire alignment
 
            -conservation
                         # conservation profile, a TFBS::ConservationProfile
                         # OPTIONAL: by default the conservation profile is
                         # computed internally on the fly (less efficient)
 
 

name

ID

class

matrix

length

revcom

rawprint

prettyprint

The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them.