Rechercher une page de manuel

Lingua::Ident.3pm

Langue: en

Version: 2006-11-11 (mandriva - 01/05/08)

Section: 3 (Bibliothèques de fonctions)

Sommaire

NAME
SYNOPSIS
DESCRIPTION
RETURN VALUE
WARNINGS
AUTHOR
LICENSE
SEE ALSO

NAME

Lingua::Ident -- Statistical language identification

SYNOPSIS

  use Lingua::Ident;
  $i    = new Lingua::Ident("filename 1" ... "filename n");
  $lang = $i->identify("text to classify"), "\n";

DESCRIPTION

This module implements a statistical language identifier.

The filename attributes to the constructor must refer to files containing tables of n-gram probabilites for languages. These tables can be generated using the trainlid(1) utility program.

RETURN VALUE

The identify() method returns the value specified in the _LANG field of the probabilities table of the language to which the text most likely belongs (see ``WARNINGS'').

It is recommended to be a POSIX locale name constructed from an ISO 639 2-letter language code, possibly extended by an ISO 3166 2-letter country code and a character set identifier. Example: de_DE.iso88591.

WARNINGS

Since Lingua::Ident is based on statistics it cannot be 100% accurate. More precisely, Dunning (see below) reports his implementation to achieve 92% accuracy with 50K of training text for 20 character strings discriminating bewteen English and Spanish. This implementation should be as accurate as Dunning's. However, not only the size but also the quality of the training text play a role.

The current implementation doesn't use a threshold to determine if the most probable language has a high enough probability; if you're trying to classify a text in a language for which there is no probability table, this results in getting an incorrect language.

AUTHOR

Lingua::Ident was developed by Michael Piotrowski <mxp@dynalabs.de>.

LICENSE

This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

Lingua::Ident.3pm

Sommaire

NAME

SYNOPSIS

DESCRIPTION

RETURN VALUE

WARNINGS

AUTHOR

LICENSE

SEE ALSO

Découvrir

Apprendre

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

Lingua::Ident.3pm

Sommaire

NAME

SYNOPSIS

DESCRIPTION

RETURN VALUE

WARNINGS

AUTHOR

LICENSE

SEE ALSO

Découvrir

Apprendre

Partager