Rechercher une page de manuel

ocrodjvu

Langue: en

Version: 05/24/2010 (ubuntu - 24/10/10)

Section: 1 (Commandes utilisateur)

Sommaire

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
ENVIRONMENT
SEE ALSO
AUTHOR
COPYRIGHT
NOTES

NAME

ocrodjvu - OCR for DjVu files

SYNOPSIS

ocrodjvu {-o | --save-bundled} output-djvu-file [option...] djvu-file
ocrodjvu {-i | --save-indirect} index-djvu-file [option...] djvu-file
ocrodjvu --save-script script-file [option...] djvu-file
ocrodjvu --in-place [option...] djvu-file
ocrodjvu --dry-run [option...] djvu-file
ocrodjvu {--version | --help | -h | --list-engines | --list-languages}

DESCRIPTION

ocrodjvu is a wrapper for OCR systems that allows you to perform OCR on DjVu files.

The following OCR engines are supported:

• m[blue]OCRopusm[][1] (internally, ocrodjvu calls ocroscript's recognize (or rec-tess) command, so that ultimately Tesseract acts as the OCR backend);

• m[blue]Cuneiform for Linuxm[][2].

OPTIONS

OCR engine options

--engine=engine-id

Use this OCR engine. The default is 'ocropus' (OCRopus).

--list-engines

Print list of available OCR engines.

Options controlling output

It is mandatory to use exactly one of the following options:

-o, --save-bundled=output-djvu-file

Save OCR results as a bundled multi-page document into output-djvu-file.

-i, --save-indirect=index-djvu-file

Save OCR results as an indirect multi-page document. Use index-djvu-file as the index file name; put the component files into the same directory. The directory must exist and be writable.

--save-script=script-file

Save a djvused script with OCR results into script-file.

--in-place

Save OCR results in place. (Use this option to retain compatibility with ocrodjvu < 0.2.)

--dry-run

Don't change any files, throw OCR results away.

Text segmentation options

-t lines, --details lines

Record location of every line. Don't record locations of particular words or characters. This is the default for OCRopus 0.2.

-t words, --details=words

Record location of every line and every word. Don't record locations of particular characters. This is the default for OCRopus ≥ 0.3.1 and for Cuneiform. This option is ineffective with OCRopus 0.2.

-t chars, --details=chars

Record location of every line, every word and every character. This option is ineffective with OCRopus 0.2.

--word-segmentation=simple

Consider each non-empty sequence of non-whitespace characters a single word. This is the default, despite being linguistically incorrect.

--word-segmentation=uax29

Use the m[blue]Unicode Text Segmentationm[][3] algorithm to break lines into words.
This option breaks assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended.

Other options

--clear-text

Remove existing hidden text if present in the pages not selected for OCR. (Use this option to retain compatibility with ocrodjvu < 0.2.)

--ocr-only

Don't save pages that were not processed.

--language=language-id

Set recognition language. language-id is typically an ISO 639-2 three-letter code. For OCRopus, the default is 'eng' (English), unless the tesslanguage environment variable is set. For other OCR engines, the default is always 'eng'.

--list-languages

Print list of available languages for the currently selected OCR engine.

--render=mask

Render only masks of page images. This is the default.

--render=foreground

Render only foreground layers of page images.

--render=all

Render all layers of page images. This option is necessary to OCR DjVu files with invalid foreground/background separation.

-p, --pages=page-range

Specifies pages to process. page-range is a comma-separated list of sub-ranges. Each sub-range is either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are numbered from 1. The default is to process all pages.

-j, --jobs=n

Start up to n OCR processes.

-D, --debug

To ease debugging, don't delete intermediate files.

--version

Output version information and exit.

-h, --help

Display help and exit.

ENVIRONMENT

The following environment variables affects ocrodjvu:

tesslanguage

Recognition language for Tesseract.
(Use this variable is deprecated in favor of the --language option.)

TMPDIR

Directory for temporary files. The default is /tmp.

AUTHOR

Jakub Wilk <jwilk@jwilk.net>

Author.

COPYRIGHT

NOTES

1.

OCRopus

http://ocropus.googlecode.com/

2.

Cuneiform for Linux

http://launchpad.net/cuneiform-linux

3.

Unicode Text Segmentation

http://unicode.org/reports/tr29/

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

ocrodjvu

Sommaire

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

OCR engine options

Options controlling output

Text segmentation options

Other options

ENVIRONMENT

SEE ALSO

AUTHOR

COPYRIGHT

NOTES

Découvrir

Apprendre

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

ocrodjvu

Sommaire

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

OCR engine options

Options controlling output

Text segmentation options

Other options

ENVIRONMENT

SEE ALSO

AUTHOR

COPYRIGHT

NOTES

Découvrir

Apprendre

Partager