Rechercher une page de manuel
fastrconf
Langue: en
Version: fastr configuration 2.04 (mandriva - 01/05/08)
Section: 1 (Commandes utilisateur)
NAME
fastrconf - how to set runtime parametersDESCRIPTION
Upon startup, fastr looks for a configuration file given by the -C switch. If there is no -C switch, the default configuration file is ./fastr.conf.The configuration file is composed of fields and values of three categories:
- 1.
- File names. Names of the main resource files, name of the log file, and name of the unification files.
- 2.
- Switches. Parameters for customizing the formats of the input and output files, or for customizing the parameters of the processor.
- 3.
- Dimensions. These dimensions are intended for the tuning of the application to an optimal use of the memory. Inadequacy of these values may result in a failure during processing. An error is reported indicating which value must be increased.
FILE NAMES
- Message file
- Name of the file containing the error messages and other language-dependent resources such as the names of the fields of the configuration file. (Here, language-dependents means related to the language of the user, not to the language under study.)
- Dictionary file
- Name of the file containing the compiled dictionary. This name is the prefix of five files with the following suffixes:
- BTR B-Tree used for accessing the dictionary of single words. See the Fields B-Tree levels and B-Tree node size for defining the dimensions of the B-Tree.
- IDE File linking the identifiers of the grammar rules (see the Path Label path in the fastrlang section) to their internal key.
- LEM Dictionary file of single words (lemmas).
- REF File linking the lemmas to their internal key.
- LIN File linking the terms to their lexical anchor.
- RUL Grammar file of multi-word terms.
- Language file
- Name of the language file containing the linguistic parameters of the application: mainly morphology, frequent words, and meta-grammar (syntactic transformations). More information about the language file is given in the fastrlang section.
- Log file
- Name of the log file.
- External unifier
- Name of the external process called for external modification by starred metarules (see the Meta-rules in the fastrlang section).
- Unification file
- Name of the file used for sending data to the external process called for external modification by starred metarules (see the Meta-rules in the fastrlang section).
SWITCHES
- Parsing trace
- If the value is 1, a very verbose trace is left in the log file (see the File name Log file).
- Break on compile errors
- If the value is 1, compiling of language file or rule file is stopped every time an error is encountered.
- Agreement checking
- If the value is 1, unification is performed after successful rewriting. Otherwise, no unification is made and every successful rewriting is reported as a success.
- Number preprocessing
- If the value is 1, numbers are considered as specific words and assigned a predefined category (see the Feature Numeral in the fastrlang section).
- Necessary lexicalization
- If the value is 1, rules whose lexical anchor is not in the current sentence are not parsed (see the Path Lexicalization path in the fastrlang section).
- Minimal arity
- The arity of a term is the number of daughter nodes of the root node (see the Context-Free Skeleton of Term Rules in the fastrdata section). The value of this field is the minimal arity of terms which are used as indexes. For example, setting the minimal arity to 2 prevents single-word terms to be used as indexes.
- Maximal arity
- The value of this field is the maximal arity of terms which are used as indexes.
- Disk dictionary
- If the value is 1, the compiled dictionary of single words (suffix LEM) and multi-word terms (suffix REG) is not loaded into memory.
- Reduced dictionary
- If the value is 1, the dictionary is compressed. Access times are slightly longer.
- Memory Rules loading
- If the value is 1, the memory rules of the language file are loaded into memory and systematically used for parsing (see Memory rules in the fastrlang section).
- Metarules loading
- If the value is 1, the meta-rules of the language file are loaded into memory and systematically used for parsing (see Meta-rules in the fastrlang section).
- Input format
- The value of the input format is an integer n, where n can be 0, 1, 2, or 3.
- 0 The standard text format.
- 1 If the format is 1, the output format 1 is expected as input. This format is useful for re-indexing.
- 2 A text format where each lines begins with a number of 6 digits followed by a tabulation.
- 3 Tagged text format.
- 4 Partially tagged text format.
- Output format
- The value of the output format is an integer n, where n can be 0, or 1.
- 0 The standard text format.
- 1 A 4-field output format: sentence number, term identifier, term occurrence, variation (a meta-rule identifier or 0 for non-variant occurrences).
- 004948 24232 fractions of cells XX,22,FPerm
- Output with input text
- If the value is 0, the input text is not echoed to the standard output. If the value is 1, the input text is always echoed to the standard output; if the value is 2, it is only echoed if the current sentence contains at least one index.
- Output with header
- Indexing is preceding by a heading and followed by statistics on the indexing.
- First line number
- The value of this field is an integer: the number assigned to the first line.
- Automatic rule ordering
- If the value is 1, rule compiling is automatically followed by a re-sequencing of the term rules for optimization purpose.
- Automatic linking
- If the value is 1, rule compiling is automatically followed by rule linking. This option is mandatory in case of embedded rules (see Term Rules in the fastrdata section).
- Multiple lexicalization
- If the value is 1, rules whose lexical items are not in the current sentence are not parsed. The lexical items of a term are its content words: words whose category is a major category (see the Field Major category in the fastrlang section).
- Batch processing
- If the value is 1, fastr runs in command mode. If the value is 0, fastr runs in interactive mode and a menu is displayed.
- Suffix stripping
- If the value is 1, a very crude version of stemming is performed as follows:
- 1. The last Suffix stripping size characters are removed from each word.
- 2. Two words are considered as identical if the remaining strings are included one in another.
- Suffix stripping size
- The value of the input format is an integer n, where n is the number of ending characters removed in case of crude stemming (see Suffix stripping).
- Inflectional morphology
- If the value is 1, fastr performs a morphological analysis of the words in the text. Should be turned off for lemmatized corpora.
- Derivational morphology
- Controls the type of morphological analysis performed by fastr.
- 0 Never analyzes the words.
- 1 Always tries to find the roots of the words.
- 2 Attempts morphological analysis only if no morphological link is provided in the dictionary for the current lemma.
- Morphological linkage
- If the value is 1, fastr exploits the morphological links in the lexicon for linking each lemma to its derivational family. Should be turned on for morpho-syntactic variant extraction.
- Morphological self-root
- If the value is 1, fastr automatically assigns a morphological link to <self>.
- Semantic linkage
- If the value is 1, fastr exploits the semantic links in the lexicon for linking each lemma to its semantic family. Should be turned on for semantic variant extraction.
- Semantic self-linkage
- If the value is 1, fastr automatically assigns semantic links to <self>.
- Automatic lowercasing
- If the value is 1, fastr automatically converts words into lowercases.
- Automatic segmentation
- If the value is 1, fastr automatically cuts sentences into words. Otherwise, segmentation only occurs for space characters.
DIMENSIONS
All the fields classified as Dimensions have an integer value. These values are used for optimizing memory management.- B-Tree levels
- The depth of the B-Tree (see the Field Dictionary file, suffix BAR).
- B-Tree node size
- The number of nodes in an hyper-node of the B-Tree (see the Field Dictionary file, suffix BAR).
- B-Tree updating (compil)
- Rate of B-Tree up-dating during compiling.
- I/O buffer size (text)
- Size of the input buffer.
- I/O buffer size (lemmas)
- Size of the buffer used for accessing the single word dictionary (see the Field Dictionary file, suffix LEM).
- I/O buffer size (rules)
- Size of the buffer used for accessing the term dictionary (see the Field Dictionary file, suffix REG).
- Size of stack
- Size of the rewriting stack.
- Max words in sentence
- Maximal number of words read in a parsed sentence.
- Max refs in sentence
- Maximal number of word references in a parsed sentence.
- Max rules
- Maximal number of rules loaded in a parsed sentence.
- Max memory rules
- Maximal number of memory rules in the language file.
- Max terms
- Maximal number of nodes composing the rules loaded in a parsed sentence.
- Max metarules
- Maximal number of meta-rules in the language file.
- Max lemmas
- Maximal number of lemmas found in a parsed sentence.
- Max dag nodes
- Maximal number of nodes composing the feature structures of the rules loaded in a parsed sentence.
- Max nodes copied
- Maximal number of nodes copied for unification purposes.
- Max features
- Maximal number of features in the language file.
- Max suffixes
- Maximal number of suffixes in the language file.
- Max terms in a rule
- Maximal number of nodes composing a rules.
- Max term strings
- Maximal number of terms within a grammar.
- Max term links
- Maximal number of term links in a process of acquisition.
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre