Bio::DB::EUtilities.3pm

Langue: en

Version: 2008-01-11 (mandriva - 01/05/08)

Section: 3 (Bibliothèques de fonctions)

NAME

Bio::DB::EUtilities - interface for handling web queries and data retrieval from Entrez Utilities at NCBI.

SYNOPSIS

use Bio::DB::EUtilities;
   my $esearch = Bio::DB::EUtilities->new(-eutil      => 'esearch',
                                          -db         => 'pubmed',
                                          -term       => 'hutP',
                                          -usehistory => 'y');
 
 
   $esearch->get_response; # parse the response, fetch a cookie
 
 
   my $elink = Bio::DB::EUtilities->new(-eutil        => 'elink',
                                        -db           => 'protein',
                                        -dbfrom       => 'pubmed',
                                        -cookie       => $esearch->next_cookie,
                                        -cmd          => 'neighbor_history');
 
 
   $elink->get_response; # parse the response, fetch the next cookie
 
 
   my $efetch = Bio::DB::EUtilities->new(-cookie       => $elink->next_cookie,
                                         -retmax       => 10,
                                         -rettype      => 'fasta');
 
 
   print $efetch->get_response->content;
 
 

DESCRIPTION

WARNING: Please do NOT spam the Entrez web server with multiple requests. NCBI offers Batch Entrez for this purpose, now accessible here via epost!

This is a test interface to the Entrez Utilities at NCBI. The main purpose of this is to enable access to all of the NCBI databases available through Entrez and allow for more complex queries. It is likely that the API for this module as well as the documentation will change dramatically over time. So, novice users and neophytes beware!

The experimental base class is Bio::DB::GenericWebDBI, which as the name implies enables access to any web database which will accept parameters. This was originally born from an idea to replace WebDBSeqI/NCBIHelper with a more general web database accession tool so one could access sequence information, taxonomy, SNP, PubMed, and so on. However, this may ultimately prove to be better used as a replacement for LWP::UserAgent when ccessing NCBI-related web tools (Entrez Utilitites, or EUtilities). Using the base class GenericWebDBI, one could also build web interfaces to other databases to access anything via CGI parameters.

Currently, you can access any database available through the NCBI interface:

   http://eutils.ncbi.nlm.nih.gov/
 
 

At this point, Bio::DB::EUtilities uses the EUtilities plugin modules somewhat like Bio::SeqIO. So, one would call the particular EUtility (epost, efetch, and so forth) upon instantiating the object using a set of parameters:

   my $esearch = Bio::DB::EUtilities->new(-eutil      => 'esearch',
                                          -db         => 'pubmed',
                                          -term       => 'dihydroorotase',
                                          -usehistory => 'y');
 
 

The default EUtility (when "eutil" is left out) is 'efetch'. For specifics on each EUtility, see their respective POD (**these are incomplete**) or the NCBI Entrez Utilities page:

   http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
 
 

At this time, retrieving the response is accomplished by using the method get_response (which also parses for cookies and other information, see below). This method returns an HTTP::Response object. The raw data is accessed by using the object method "content", like so:

   my $efetch = Bio::DB::EUtilities->new(-cookie       => $elink->next_cookie,
                                         -retmax       => 10,
                                         -rettype      => 'fasta');
 
 
   print $efetch->get_response->content;
 
 

Based on this, if one wanted to retrieve sequences or other raw data but was not interested in directly using Bio* objects (such as if genome sequences were to be retrieved) one could do so by using the proper EUtility object(s) and query(ies) and get the raw response back from NCBI through 'efetch'.

A great deal of the documentation here will likely end up in the form of a HOWTO at some future point, focusing on getting data into Bioperl objects.

Cookies

Some EUtilities ("epost", "esearch", or "elink") retain information on the NCBI server under certain settings. This information can be retrieved by using a cookie. Here, the idea of the 'cookie' is similar to the 'cookie' set on a your computer when browsing the Web. XML data returned by these EUtilities, when applicable, is parsed for the cookie information (the 'WebEnv' and 'query_key' tags to be specific) The information along with other identifying data, such as the calling eutility, description of query, etc.) is stored as a Bio::DB::EUtilities::Cookie object in an internal queue. These can be retrieved one at a time by using the next_cookie method or all at once in an array using get_all_cookies. Each cookie can then be 'fed', one at a time, to another EUtility object, thus enabling chained queries as demonstrated in the synopsis.

For more information, see the POD documentation for Bio::DB::EUtilities::Cookie.

TODO

Resetting internal parameters is planned so one could feasibly reuse the objects once instantiated, such as if one were to use this as a replacement for LWP::UserAgent when retrieving responses i.e. when using many of the Bio::DB* NCBI-related modules.

File and filehandle support to be added.

Switch over XML parsing in most EUtilities to XML::SAX (currently use XML::Simple)

Any feedback is welcome.

FEEDBACK


Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

   bioperl-l@lists.open-bio.org               - General discussion
   http://www.bioperl.org/wiki/Mailing_lists  - About the mailing lists
 
 

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web.

   http://bugzilla.open-bio.org/
 
 

AUTHOR

Email cjfields at uiuc dot edu

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

add_cookie

  Title   : cookie
  Usage   : $db->add_cookie($cookie)
  Function: adds an NCBI query cookie to the internal cookie queue
  Returns : none
  Args    : a Bio::DB::EUtilities::Cookie object
 
 

next_cookie

  Title   : next_cookie
  Usage   : $cookie = $db->next_cookie
  Function: return a cookie from the internal cookie queue
  Returns : a Bio::DB::EUtilities::Cookie object
  Args    : none
 
 

reset_cookies

  Title   : reset_cookies
  Usage   : $db->reset_cookies
  Function: resets (empties) the internal cookie queue
  Returns : none
  Args    : none
 
 

get_all_cookies

  Title   : get_all_cookies
  Usage   : @cookies = $db->get_all_cookies
  Function: retrieves all cookies from the internal cookie queue; this leaves
            the cookies in the queue intact 
  Returns : array of cookies (if wantarray) of first cookie
  Args    : none
 
 

get_cookie_count

  Title   : get_cookie_count
  Usage   : $ct = $db->get_cookie_count
  Function: returns # cookies in internal queue
  Returns : integer 
  Args    : none
 
 

rewind_cookies

  Title   : rewind_cookies
  Usage   : $elink->rewind_cookies;
  Function: resets cookie index to 0 (starts over)
  Returns : None
  Args    : None
 
 

keep_cookies

  Title   : keep_cookies
  Usage   : $db->keep_cookie(1)
  Function: Flag to retain the internal cookie queue;
            this is normally emptied upon using get_response
  Returns : none
  Args    : Boolean - value that evaluates to TRUE or FALSE
 
 

parse_response

  Title   : parse_response
  Usage   : $db->_parse_response($content)
  Function: parse out response for cookies and other goodies
  Returns : empty
  Args    : none
  Throws  : Not implemented (implemented in plugin classes)
 
 

get_response

  Title   : get_response
  Usage   : $db->get_response($content)
  Function: main method to submit request and retrieves a response
  Returns : HTTP::Response object
  Args    : None
 
 

get_ids

  Title   : get_ids
  Usage   : $count = $elink->get_ids($db); # array ref of specific db ids
            @ids   = $esearch->get_ids(); # array
            $ids   = $esearch->get_ids(); # array ref
  Function: returns an array or array ref of unique IDs.
  Returns : array or array ref of ids 
  Args    : Optional : database string if elink used (required arg if searching
            multiple databases for related IDs)
            Currently implemented only for elink object with single linksets
 
 

delay_policy

   Title   : delay_policy
   Usage   : $secs = $self->delay_policy
   Function: return number of seconds to delay between calls to remote db
   Returns : number of seconds to delay
   Args    : none
 
 
   NOTE: NCBI requests a delay of 3 seconds between requests.  This method
         implements that policy.
 
 

get_entrezdbs

   Title   : get_entrezdbs
   Usage   : @dbs = $self->get_entrezdbs;
   Function: return list of all Entrez databases; convenience method
   Returns : array or array ref (based on wantarray) of databases 
   Args    : none
 
 

Private methods


_eutil

  Title   : _eutil
  Usage   : $db->_eutil;
  Function: sets eutil 
  Returns : eutil
  Args    : eutil