Rechercher une page de manuel
djvu2hocr
Langue: en
Version: 05/24/2010 (ubuntu - 24/10/10)
Section: 1 (Commandes utilisateur)
Sommaire
NAME
djvu2hocr - DjVu to hOCR converterSYNOPSIS
- djvu2hocr [option...] djvu-file
- djvu2hocr {--version | --help | -h}
DESCRIPTION
- djvu2hocr converts hidden text from a DjVu file to the m[blue]hOCRm[][1] format.
OPTIONS
Text segmentation options
--word-segmentation=simple
- Use the same word segmentation as found in the DjVu file.
This is the default.
--word-segmentation=uax29
- Use the m[blue]Unicode Text Segmentationm[][2] algorithm to break lines into words, possibly fixing word segmentation found in the DjVu file.
Other options
--version
- Output version information and exit.
-h, --help
- Display help and exit.
PORTABILITY
djvu2hocr uses a custom extension to hOCR to retain characters which cannot be directly represented in an HTML/XML document. For example, control character BEL (^G, U+0007), is converted into the following HTML chunk: <span class="djvu_char" title="#x07"> </span>
SEE ALSO
djvu(1)
AUTHOR
Jakub Wilk <jwilk@jwilk.net>
- Author.
COPYRIGHT
Copyright © 2009, 2010 Jakub Wilk
NOTES
- 1.
- hOCR
- http://docs.google.com/View?docid=dfxcv4vc_67g844kf
- 2.
- Unicode Text Segmentation
- http://unicode.org/reports/tr29/
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre