Rechercher une page de manuel
Text::English.3pm
Langue: en
Version: 2005-04-10 (debian - 07/07/09)
Section: 3 (Bibliothèques de fonctions)
NAME
Text::English - Porter's stemming algorithmSYNOPSIS
use Text::English; @stems = Text::English::stem( @words );
DESCRIPTION
This routine applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is derived from the C program ``stemmer.c'' as found in freewais and elsewhere, which contains these notes:Purpose: Implementation of the Porter stemming algorithm documented in: Porter, M.F., "An Algorithm For Suffix Stripping," Program 14 (3), July 1980, pp. 130-137. Provenance: Written by B. Frakes and C. Cox, 1986.
I have re-interpreted areas that use Frakes and Cox's ``WordSize'' function. My version may misbehave on short words starting with ``y'', but I can't think of any examples.
The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix.
NOTES
This is version 0.1. I would welcome feedback, especially improvements to the punctuation-stripping step.AUTHOR
Ian Phillipps <ian@unipalm.pipex.com>COPYRIGHT
Copyright Public IP Exchange Ltd (PIPEX). Available for use under the same terms as perl.Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre