fastrconf

Langue: en

Version: fastr configuration 2.04 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

fastrconf - how to set runtime parameters

DESCRIPTION

Upon startup, fastr looks for a configuration file given by the -C switch. If there is no -C switch, the default configuration file is ./fastr.conf.

The configuration file is composed of fields and values of three categories:

1.
File names. Names of the main resource files, name of the log file, and name of the unification files.
2.
Switches. Parameters for customizing the formats of the input and output files, or for customizing the parameters of the processor.
3.
Dimensions. These dimensions are intended for the tuning of the application to an optimal use of the memory. Inadequacy of these values may result in a failure during processing. An error is reported indicating which value must be increased.

FILE NAMES

Message file
Name of the file containing the error messages and other language-dependent resources such as the names of the fields of the configuration file. (Here, language-dependents means related to the language of the user, not to the language under study.)
Dictionary file
Name of the file containing the compiled dictionary. This name is the prefix of five files with the following suffixes:
BTR B-Tree used for accessing the dictionary of single words. See the Fields B-Tree levels and B-Tree node size for defining the dimensions of the B-Tree.
IDE File linking the identifiers of the grammar rules (see the Path Label path in the fastrlang section) to their internal key.
LEM Dictionary file of single words (lemmas).
REF File linking the lemmas to their internal key.
LIN File linking the terms to their lexical anchor.
RUL Grammar file of multi-word terms.
Language file
Name of the language file containing the linguistic parameters of the application: mainly morphology, frequent words, and meta-grammar (syntactic transformations). More information about the language file is given in the fastrlang section.
Log file
Name of the log file.
External unifier
Name of the external process called for external modification by starred metarules (see the Meta-rules in the fastrlang section).
Unification file
Name of the file used for sending data to the external process called for external modification by starred metarules (see the Meta-rules in the fastrlang section).

SWITCHES

Parsing trace
If the value is 1, a very verbose trace is left in the log file (see the File name Log file).
Break on compile errors
If the value is 1, compiling of language file or rule file is stopped every time an error is encountered.
Agreement checking
If the value is 1, unification is performed after successful rewriting. Otherwise, no unification is made and every successful rewriting is reported as a success.
Number preprocessing
If the value is 1, numbers are considered as specific words and assigned a predefined category (see the Feature Numeral in the fastrlang section).
Necessary lexicalization
If the value is 1, rules whose lexical anchor is not in the current sentence are not parsed (see the Path Lexicalization path in the fastrlang section).
Minimal arity
The arity of a term is the number of daughter nodes of the root node (see the Context-Free Skeleton of Term Rules in the fastrdata section). The value of this field is the minimal arity of terms which are used as indexes. For example, setting the minimal arity to 2 prevents single-word terms to be used as indexes.
Maximal arity
The value of this field is the maximal arity of terms which are used as indexes.
Disk dictionary
If the value is 1, the compiled dictionary of single words (suffix LEM) and multi-word terms (suffix REG) is not loaded into memory.
Reduced dictionary
If the value is 1, the dictionary is compressed. Access times are slightly longer.
Memory Rules loading
If the value is 1, the memory rules of the language file are loaded into memory and systematically used for parsing (see Memory rules in the fastrlang section).
Metarules loading
If the value is 1, the meta-rules of the language file are loaded into memory and systematically used for parsing (see Meta-rules in the fastrlang section).
Input format
The value of the input format is an integer n, where n can be 0, 1, 2, or 3.
0 The standard text format.
1 If the format is 1, the output format 1 is expected as input. This format is useful for re-indexing.
2 A text format where each lines begins with a number of 6 digits followed by a tabulation.
3 Tagged text format.
4 Partially tagged text format.
Output format
The value of the output format is an integer n, where n can be 0, or 1.
0 The standard text format.
1 A 4-field output format: sentence number, term identifier, term occurrence, variation (a meta-rule identifier or 0 for non-variant occurrences).
004948 24232 fractions of cells XX,22,FPerm
Output with input text
If the value is 0, the input text is not echoed to the standard output. If the value is 1, the input text is always echoed to the standard output; if the value is 2, it is only echoed if the current sentence contains at least one index.
Output with header
Indexing is preceding by a heading and followed by statistics on the indexing.
First line number
The value of this field is an integer: the number assigned to the first line.
Automatic rule ordering
If the value is 1, rule compiling is automatically followed by a re-sequencing of the term rules for optimization purpose.
Automatic linking
If the value is 1, rule compiling is automatically followed by rule linking. This option is mandatory in case of embedded rules (see Term Rules in the fastrdata section).
Multiple lexicalization
If the value is 1, rules whose lexical items are not in the current sentence are not parsed. The lexical items of a term are its content words: words whose category is a major category (see the Field Major category in the fastrlang section).
Batch processing
If the value is 1, fastr runs in command mode. If the value is 0, fastr runs in interactive mode and a menu is displayed.
Suffix stripping
If the value is 1, a very crude version of stemming is performed as follows:
1. The last Suffix stripping size characters are removed from each word.
2. Two words are considered as identical if the remaining strings are included one in another.
Suffix stripping size
The value of the input format is an integer n, where n is the number of ending characters removed in case of crude stemming (see Suffix stripping).
Inflectional morphology
If the value is 1, fastr performs a morphological analysis of the words in the text. Should be turned off for lemmatized corpora.
Derivational morphology
Controls the type of morphological analysis performed by fastr.
0 Never analyzes the words.
1 Always tries to find the roots of the words.
2 Attempts morphological analysis only if no morphological link is provided in the dictionary for the current lemma.
Morphological linkage
If the value is 1, fastr exploits the morphological links in the lexicon for linking each lemma to its derivational family. Should be turned on for morpho-syntactic variant extraction.
Morphological self-root
If the value is 1, fastr automatically assigns a morphological link to <self>.
Semantic linkage
If the value is 1, fastr exploits the semantic links in the lexicon for linking each lemma to its semantic family. Should be turned on for semantic variant extraction.
Semantic self-linkage
If the value is 1, fastr automatically assigns semantic links to <self>.
Automatic lowercasing
If the value is 1, fastr automatically converts words into lowercases.
Automatic segmentation
If the value is 1, fastr automatically cuts sentences into words. Otherwise, segmentation only occurs for space characters.

DIMENSIONS

All the fields classified as Dimensions have an integer value. These values are used for optimizing memory management.
B-Tree levels
The depth of the B-Tree (see the Field Dictionary file, suffix BAR).
B-Tree node size
The number of nodes in an hyper-node of the B-Tree (see the Field Dictionary file, suffix BAR).
B-Tree updating (compil)
Rate of B-Tree up-dating during compiling.
I/O buffer size (text)
Size of the input buffer.
I/O buffer size (lemmas)
Size of the buffer used for accessing the single word dictionary (see the Field Dictionary file, suffix LEM).
I/O buffer size (rules)
Size of the buffer used for accessing the term dictionary (see the Field Dictionary file, suffix REG).
Size of stack
Size of the rewriting stack.
Max words in sentence
Maximal number of words read in a parsed sentence.
Max refs in sentence
Maximal number of word references in a parsed sentence.
Max rules
Maximal number of rules loaded in a parsed sentence.
Max memory rules
Maximal number of memory rules in the language file.
Max terms
Maximal number of nodes composing the rules loaded in a parsed sentence.
Max metarules
Maximal number of meta-rules in the language file.
Max lemmas
Maximal number of lemmas found in a parsed sentence.
Max dag nodes
Maximal number of nodes composing the feature structures of the rules loaded in a parsed sentence.
Max nodes copied
Maximal number of nodes copied for unification purposes.
Max features
Maximal number of features in the language file.
Max suffixes
Maximal number of suffixes in the language file.
Max terms in a rule
Maximal number of nodes composing a rules.
Max term strings
Maximal number of terms within a grammar.
Max term links
Maximal number of term links in a process of acquisition.