Rechercher une page de manuel

Lingua::Stem::EnBroken.3pm

Langue: en

Autres versions - même langue

2007-06-23 (mandriva - 01/05/08)

Version: 2007-10-23 (debian - 07/07/09)

Section: 3 (Bibliothèques de fonctions)

Sommaire

NAME
SYNOPSIS
DESCRIPTION
CHANGES
METHODS
NOTES
SEE ALSO
AUTHOR
COPYRIGHT
BUGS
TODO

NAME

Lingua::Stem::EnBroken - Porter's stemming algorithm for 'generic' English

SYNOPSIS

     use Lingua::Stem::EnBroken;
     my $stems   = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference,
                                         -locale => 'en',
                                     -exceptions => $exceptions_hash,
                                      });

DESCRIPTION

This routine MIS-applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is an intentionally broken version of Lingua::Stem::En for people needing backwards compatibility with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you aren't one of those people.

It is derived from the C program ``stemmer.c'' as found in freewais and elsewhere, which contains these notes:

    Purpose:    Implementation of the Porter stemming algorithm documented
                in: Porter, M.F., "An Algorithm For Suffix Stripping,"
                Program 14 (3), July 1980, pp. 130-137.
    Provenance: Written by B. Frakes and C. Cox, 1986.

I have re-interpreted areas that use Frakes and Cox's ``WordSize'' function. My version may misbehave on short words starting with ``y'', but I can't think of any examples.

The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix.

CHANGES

  2003.09.28 -  Documentation fix

  2000.09.14 -  Forked from the Lingua::Stem::En.pm module to provide
                a backward compatibly broken version for people needing
                consistent behavior with 0.30 and 0.40 more than accurate
                stemming.

METHODS

stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions });

Stems a list of passed words using the rules of US English. Returns an anonymous array reference to the stemmed words.

Example:

   my $stemmed_words = Lingua::Stem::EnBroken::stem({ -words => \@words,
                                               -locale => 'en',
                                           -exceptions => \%exceptions,
                           });

stem_caching({ -level => 0|1|2 });

Sets the level of stem caching.

'0' means 'no caching'. This is the default level.

'1' means 'cache per run'. This caches stemming results during a single
call to 'stem'.

'2' means 'cache indefinitely'. This caches stemming results until
either the process exits or the 'clear_stem_cache' method is called.

clear_stem_cache;

Clears the cache of stemmed words

NOTES

This code is almost entirely derived from the Porter 2.1 module written by Jim Richardson.

AUTHOR

   Jim Richardson, University of Sydney
   jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html

   Integration in Lingua::Stem by
   Benjamin Franz, FreeRun Technologies,
   snowhare@nihongo.org or http://www.nihongo.org/snowhare/

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

Lingua::Stem::EnBroken.3pm

Sommaire

NAME

SYNOPSIS

DESCRIPTION

CHANGES

METHODS

NOTES

SEE ALSO

AUTHOR

COPYRIGHT

BUGS

TODO

Découvrir

Apprendre

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

Lingua::Stem::EnBroken.3pm

Sommaire

NAME

SYNOPSIS

DESCRIPTION

CHANGES

METHODS

NOTES

SEE ALSO

AUTHOR

COPYRIGHT

BUGS

TODO

Découvrir

Apprendre

Partager