Rechercher une page de manuel

Chercher une autre page de manuel:

wordlist2dawg

Langue: en

Version: 256826 (debian - 07/07/09)

Section: 1 (Commandes utilisateur)

NAME

tesseract - command line OCR tool

SYNOPSIS

Part of the process to train tesseract for a new language. Tesseract uses 3 dictionary files for each language. Two of the files are coded as a Directed Acyclic Word Graph (DAWG), and the other is a plain UTF-8 text file. To make the DAWG dictionary files, you first need a wordlist for your language. The wordlist is formatted as a UTF-8 text file with one word per line. Split the wordlist into two sets: the frequent words, and the rest of the words, and then use wordlist2dawg to make the DAWG files:

wordlist2dawg frequent_words_list freq-dawg

wordlist2dawg words_list word-dawg

DESCRIPTION

This manual page documents briefly the wordlist2dawg command.

tesseract is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.

SEE ALSO

feh(1), convert(1), mftraining(1), cntraining(1), unicharset_extractor(1), tesseract(1).

AUTHOR

tesseract was written by Ray Smith.

This manual page was written by Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>, for the Debian project (but may be used by others).

13:28 recherche linuxienne sur lyon et ses environs pour parfaire ma
formation linux ...
13:31 13:28 CE GENRE D'ANNONCE, FAUT LES DIRE PLUS FORT !! SINON PERSONNE
NE TE REMARQUE !
13:35 RECHERCHE LINUXIENNE SUR LYON ET ENVIRON POUR PARFAIRE MA FORMATION
LINUX...svp...merci...exciouse dé dérange...
13:39 c'est mieux, 13:35, c'est mieux :) tu aurais pu préciser aussi NON
SERIEUX S'ABSTENIR
13:40 NON SERIEUSE S'ABSTENIR..surtout si tu n'es pas blonde à forte
poitrine.... svp...merci.....exciouse dé dérange ...