XML::SAX::ByRecord.3pm

Langue: en

Autres versions - même langue

Version: 2009-06-11 (ubuntu - 24/10/10)

Section: 3 (Bibliothèques de fonctions)

NAME

XML::SAX::ByRecord - Record oriented processing of (data) documents

SYNOPSIS

     use XML::SAX::Machines qw( ByRecord ) ;
 
     my $m = ByRecord(
         "My::RecordFilter1",
         "My::RecordFilter2",
         ...
         {
             Handler => $h, ## optional
         }
     );
 
     $m->parse_uri( "foo.xml" );
 
 

DESCRIPTION

XML::SAX::ByRecord is a SAX machine that treats a document as a series of records. Everything before and after the records is emitted as-is while the records are excerpted in to little mini-documents and run one at a time through the filter pipeline contained in ByRecord.

The output is a document that has the same exact things before, after, and between the records that the input document did, but which has run each record through a filter. So if a document has 10 records in it, the per-record filter pipeline will see 10 sets of ( start_document, body of record, end_document ) events. An example is below.

This has several use cases:

Big, record oriented documents

Big documents can be treated a record at a time with various DOM oriented processors like XML::Filter::XSLT.

Streaming XML

Small sections of an XML stream can be run through a document processor without holding up the stream.

Record oriented style sheets / processors

Sometimes it's just plain easier to write a style sheet or SAX filter that applies to a single record at at time, rather than having to run through a series of records.

Topology

Here's how the innards look:
    +-----------------------------------------------------------+
    |                  An XML:SAX::ByRecord                     |
    |    Intake                                                 |
    |   +----------+    +---------+         +--------+  Exhaust |
  --+-->| Splitter |--->| Stage_1 |-->...-->| Merger |----------+----->
    |   +----------+    +---------+         +--------+          |
    |               \                            ^              |
    |                \                           |              |
    |                 +---------->---------------+              |
    |                   Events not in any records               |
    |                                                           |
    +-----------------------------------------------------------+
 
 

The "Splitter" is an XML::Filter::DocSplitter by default, and the "Merger" is an XML::Filter::Merger by default. The line that bypasses the ``Stage_1 ...'' filter pipeline is used for all events that do not occur in a record. All events that occur in a record pass through the filter pipeline.

Example

Here's a quick little filter to uppercase text content:
     package My::Filter::Uc;
 
     use vars qw( @ISA );
     @ISA = qw( XML::SAX::Base );
 
     use XML::SAX::Base;
 
     sub characters {
         my $self = shift;
         my ( $data ) = @_;
         $data->{Data} = uc $data->{Data};
         $self->SUPER::characters( @_ );
     }
 
 

And here's a little machine that uses it:

     $m = Pipeline(
         ByRecord( "My::Filter::Uc" ),
         \$out,
     );
 
 

When fed a document like:

     <root> a
         <rec>b</rec> c
         <rec>d</rec> e
         <rec>f</rec> g
     </root>
 
 

the output looks like:

     <root> a
         <rec>B</rec> c
         <rec>C</rec> e
         <rec>D</rec> g
     </root>
 
 

and the My::Filter::Uc got three sets of events like:

     start_document
     start_element: <rec>
     characters:    'b'
     end_element:   </rec>
     end_document
 
     start_document
     start_element: <rec>
     characters:    'd'
     end_element:   </rec>
     end_document
 
     start_document
     start_element: <rec>
     characters:   'f'
     end_element:   </rec>
     end_document
 
 

METHODS

new
     my $d = XML::SAX::ByRecord->new( @channels, \%options );
 
 

Longhand for calling the ByRecord function exported by XML::SAX::Machines.

CREDIT

Proposed by Matt Sergeant, with advise by Kip Hampton and Robin Berjon.

Writing an aggregator.

To be written. Pretty much just that "start_manifold_processing" and "end_manifold_processing" need to be provided. See XML::Filter::Merger and it's source code for a starter.