Rechercher une page de manuel

Chercher une autre page de manuel:


Langue: en

Version: 256826 (debian - 07/07/09)

Section: 1 (Commandes utilisateur)


tesseract - command line OCR tool


Part of the process to train tesseract for a new language. Tesseract uses 3 dictionary files for each language. Two of the files are coded as a Directed Acyclic Word Graph (DAWG), and the other is a plain UTF-8 text file. To make the DAWG dictionary files, you first need a wordlist for your language. The wordlist is formatted as a UTF-8 text file with one word per line. Split the wordlist into two sets: the frequent words, and the rest of the words, and then use wordlist2dawg to make the DAWG files:

wordlist2dawg frequent_words_list freq-dawg

wordlist2dawg words_list word-dawg


This manual page documents briefly the wordlist2dawg command.

tesseract is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.


feh(1), convert(1), mftraining(1), cntraining(1), unicharset_extractor(1), tesseract(1).


tesseract was written by Ray Smith.

This manual page was written by Jeffrey Ratcliffe <>, for the Debian project (but may be used by others).

Les choses me paraissent parfaitement claires en ce qui concerne le
nom du groupe et les autres sujets qui ne sont évoqués que pour faire
sentir aux mouches la douceur des dolomites.
-+- MZ in: Guide du Cabaliste Usenet - La Cabale glisse dans la piscine -+-