KinoSearch1::Analysis::Stopalizer.3pm

Langue: en

Autres versions - même langue

Version: 2010-10-05 (fedora - 01/12/10)

Section: 3 (Bibliothèques de fonctions)

NAME

KinoSearch1::Analysis::Stopalizer - suppress a "stoplist" of common words

SYNOPSIS

     my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
         language => 'fr',
     );
     my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
         analyzers => [ $lc_normalizer, $tokenizer, $stopalizer, $stemmer ],
     );
 
 

DESCRIPTION

A ``stoplist'' is collection of ``stopwords'': words which are common enough to be of little value when determining search results. For example, so many documents in English contain ``the'', ``if'', and ``maybe'' that it may improve both performance and relevance to block them.
     # before
     @token_texts = ('i', 'am', 'the', 'walrus');
     
     # after
     @token_texts = ('',  '',   '',    'walrus');
 
 

CONSTRUCTOR

new

     my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
         language => 'de',
     );
     
     # or...
     my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
         stoplist => \%stoplist,
     );
 
 

new() takes two possible parameters, "language" and "stoplist". If "stoplist" is supplied, it will be used, overriding the behavior indicated by the value of "language".

*
stoplist - must be a hashref, with stopwords as the keys of the hash and values set to 1.
*
language - must be the ISO code for a language. Loads a default stoplist supplied by Lingua::StopWords.

SEE ALSO

Lingua::StopWords Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch1 version 1.00.