KinoSearch1::Analysis::PolyAnalyzer.3pm

Langue: en

Autres versions - même langue

Version: 2010-10-05 (fedora - 01/12/10)

Section: 3 (Bibliothèques de fonctions)

NAME

KinoSearch1::Analysis::PolyAnalyzer - multiple analyzers in series

SYNOPSIS

     my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
         language  => 'es',
     );
     
     # or...
     my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
         analyzers => [
             $lc_normalizer,
             $custom_tokenizer,
             $snowball_stemmer,
         ],
     );
 
 

DESCRIPTION

A PolyAnalyzer is a series of Analyzers --- objects which inherit from KinoSearch1::Analysis::Analyzer --- each of which will be called upon to ``analyze'' text in turn. You can either provide the Analyzers yourself, or you can specify a supported language, in which case a PolyAnalyzer consisting of an LCNormalizer, a Tokenizer, and a Stemmer will be generated for you.

Supported languages:

     en => English,
     da => Danish,
     de => German,
     es => Spanish,
     fi => Finnish,
     fr => French,
     it => Italian,
     nl => Dutch,
     no => Norwegian,
     pt => Portuguese,
     ru => Russian,
     sv => Swedish,
 
 

CONSTRUCTOR

new()

     my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
         language   => 'en',
     );
 
 

Construct a PolyAnalyzer object. If the parameter "analyzers" is specified, it will override "language" and no attempt will be made to generate a default set of Analyzers.

*
language - Must be an ISO code from the list of supported languages.
*
analyzers - Must be an arrayref. Each element in the array must inherit from KinoSearch1::Analysis::Analyzer. The order of the analyzers matters. Don't put a Stemmer before a Tokenizer (can't stem whole documents or paragraphs --- just individual words), or a Stopalizer after a Stemmer (stemmed words, e.g. ``themselv'', will not appear in a stoplist). In general, the sequence should be: normalize, tokenize, stopalize, stem.
Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch1 version 1.00.