Rechercher une page de manuel

hocr2pdf

Langue: en

Autres versions - même langue

06/05/2010 (ubuntu - 24/10/10)

Version: 253819 (debian - 07/07/09)

Section: 1 (Commandes utilisateur)

Sommaire

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
EXAMPLES
SEE ALSO
HOMEPAGE
AUTHOR

NAME

hocr2pdf - hOCR to PDF converter of the ExactImage library

SYNOPSIS

hocr2pdf [-c|--concurrent-lines NUMBER] [-d|--directions BITFIELD] [-s|--line-skip NUMBER] [-t|--threshold VALUE] FILE...FILE

hocr2pdf --help

DESCRIPTION

ExactImage is a fast C++ image processing library. Unlike ImageMagick, it allows operation in several color spaces and bit depths natively, resulting in much lower memory and computational requirements. Some optimized algorithms operate in 1/20 of the time ImageMagick requires, and displaying large images can be as fast as 1/10 of the time the "display" program takes.

hocr2pdf is a command line front-end for the image processing library to create perfectly layouted, searchable PDF files from hOCR, annotated HTML, input obtained from an OCR system.

OPTIONS

-i|--input FILE: Input image filename.
-o|--output FILE: Output PDF filename.
-n|--no-image: Do not place the image over the text.
-r|--resolution RESOLUTION: Resolution overwrite.
-s|--sloppy-text: Sloppily place text, group words, do not draw single glyphs.
-t|--text: Extract text, including trying to remove hyphens.
-h|--help: Show summary of options.

EXAMPLES

Creating a Searchable PDF from hOCR input

hOCR, annotated HTML, input must be provided to STDIN, and the image data is read using the filename from the -i or --input argument. For example:

$ hocr2pdf -i scan.tiff -o test.pdf < cuneiform-out.hocr

By default the text layer is hidden by the real image data. Including image data can be disabled via the -n, --no-image, so that just the recognized text from the OCR is visible - e.g. for debugging or to save storage space:

$ hocr2pdf -i scan.tiff -n -o test.pdf < cuneiform-out.hocr

Too many gabs between letters in individual words

This might be a problem with imprecise OCR data or justified text with huge gabs. ExactImage includes a special mode activated with the command line argument -s, --sloppy-text, to group glyphs between whitespace to words which can help PDF viewers to produce better results while cut and pasting text:

$ hocr2pdf -i scan.tiff -s -o test.pdf < cuneiform-out.hocr

HOMEPAGE

More information about hocr2pdf and the ExactImage project can be found at <http://www.exactcode.de/site/open_source/exactimage/>.

AUTHOR

ExactImage was written by ExactCODE GmbH <http://www.exactcode.de/>.

This manual page was written by Daniel Baumann <daniel@debian.org>, for the Debian project (but may be used by others).

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

hocr2pdf

Sommaire

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

SEE ALSO

HOMEPAGE

AUTHOR

Découvrir

Apprendre

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

hocr2pdf

Sommaire

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

SEE ALSO

HOMEPAGE

AUTHOR

Découvrir

Apprendre

Partager