Rechercher une page de manuel
KinoSearch::Analysis::Stopalizer.3pm
Langue: en
Version: 2010-05-02 (fedora - 01/12/10)
Section: 3 (Bibliothèques de fonctions)
NAME
KinoSearch::Analysis::Stopalizer - suppress a "stoplist" of common wordsSYNOPSIS
my $stopalizer = KinoSearch::Analysis::Stopalizer->new( language => 'fr', ); my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new( analyzers => [ $lc_normalizer, $tokenizer, $stopalizer, $stemmer ], );
DESCRIPTION
A ``stoplist'' is collection of ``stopwords'': words which are common enough to be of little value when determining search results. For example, so many documents in English contain ``the'', ``if'', and ``maybe'' that it may improve both performance and relevance to block them.# before @token_texts = ('i', 'am', 'the', 'walrus'); # after @token_texts = ('', '', '', 'walrus');
CONSTRUCTOR
new
my $stopalizer = KinoSearch::Analysis::Stopalizer->new( language => 'de', ); # or... my $stopalizer = KinoSearch::Analysis::Stopalizer->new( stoplist => \%stoplist, );
new() takes two possible parameters, "language" and "stoplist". If "stoplist" is supplied, it will be used, overriding the behavior indicated by the value of "language".
- *
- stoplist - must be a hashref, with stopwords as the keys of the hash and values set to 1.
- *
- language - must be the ISO code for a language. Loads a default stoplist supplied by Lingua::StopWords.
SEE ALSO
Lingua::StopWordsCOPYRIGHT
Copyright 2005-2009 Marvin HumphreyLICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch version 0.165.Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre