Rechercher une page de manuel
fastrlang
Langue: en
Version: fastr language 2.04 (mandriva - 01/05/08)
Section: 1 (Commandes utilisateur)
Sommaire
NAME
fastrlang - how to set linguistic parametersDESCRIPTION
Upon startup, fastr compiles a language file given by the field Language file of the configuration file (see the fastrconf section).The language file is composed of fields and values of seven categories:
- 1.
- Features. Names of labels used in the definition of linguistic properties.
- 2.
- Paths. Paths for accessing specific linguistic parameters in the definition of linguistic properties.
- 3.
- Characters. Characters used for segmenting the input text (punctuations, etc.).
- 4.
- Morphology. Definition of the inflectional and derivational system.
- 5.
- Frequent words. Inflected words stored in memory during parsing.
- 6.
- Memory rules. Syntactic rules stored in memory during parsing.
- 7.
- Meta-rules. Syntactic transformations operating on term rules.
FEATURES
- Categories
- The list of part of speech categories which are used in the definition of linguistic properties. The four specific categories Unknown category, Numeral, Punctuation, and Final punctuation may not be listed here.
- Unknown category
- Category automatically assigned to unknown words.
- Major categories
- Categories of the words that cannot be deleted in variants in case of multiple lexicalisation (see the Field Multiple lexicalization in the fastrconf section).
- Numeral
- Category automatically assigned to numerals if the switch Number preprocessing of the configuration file is on (see the Field Number preprocessing in the fastrconf section).
- Punctuation
- Category automatically assigned to punctuation characters (see the Characters Punctuation).
- Final punctuation
- Category automatically assigned to final punctuation characters (see the Characters Final punctuation).
- Features
- Symbols used for creating paths and thereby feature structures in the definition of linguistic properties.
- Singular
- The value of the Number path of singular words (see the Path Number path).
- Plural
- The value of the Number path of plural words (see the Path Number path).
PATHS
- Category path
- The value of this path is the part of speech category.
- Label path
- The value of this path is the label of a non-terminal leave in a term rule (see the Constraints of Term Rules in the fastrdata section). This value is reported on output when a rule is correctly parsed.
- Lexicalization path
- The value of this path is the label of the lexical anchor in the Context-Free Skeleton of a term rule (see the Constraints of Term Rules in the fastrdata section).
- Lemma path
- The value of this path is the lemma of a single word in a single-word rule, and the lemma of a lexical leave in a term rule or a meta-rule (see the Constraints of Term Rules or Meta-Rules in the fastrdata section).
- Inflection path
- The value of this path is the inflection number characterizing the morphological inflections of a lemma (see the Suffixes of Morphology).
- Derivation path
- The value of this path is the derivation number characterizing the morphological derivations of a lemma (see the Suffixes of Morphology).
- Form path
- The value of this path the form of an inflected word in a text (typography, followed by an hyphen or a quote, etc.).
- Number path
- The value of this path is the number (singular or plural) of an inflected word (see the Features Singular and Plural).
- Reference path
- The value of this path is the unique numerical identifier of a lemma.
- Meta label path
- The value of this path is the label of a meta-rule. This label is used for speeding up the application of meta-rules by letting rules select meta-rules (see the Constraints of Meta-Rules in the fastrdata section).
- Auxiliary stem path
- The value of this path is the list of the auxiliary stems of a lemma.
- Root path
- This path is used to carry the link to the root of a derived word.
- Reg expression path
- The value of this path is the regular expression preceding a leaf node in a Context-Free Skeleton of a meta-rule (see the Context-Free Skeleton of Meta-rules).
- Semantic path(s)
- These paths are used to carry the links to semantically related words.
- Self path
- This path is used to carry the link of a word to itself.
CHARACTERS
- Non alpha-numerical
- Alpha-numerical characters allowed within the string of a word. Other non-alpha-numerical characters (outside the ranges a-z, A-Z, and 0-9) are considered as word delimiters.
- Punctuation
- Punctuation characters (see the Feature Punctuation).
- Final punctuation
- Punctuation characters which are considered as sentence delimiters (see the Characters Punctuation).
INFLECTIONAL MORPHOLOGY
The definition of the inflectional system of a natural language in fastr consists of three parts:- 1.
- Inflected categories. The list of the inflected part of speech categories with their number of inflections (see the Feature Categories).
- 2.
- Inflection features. For each inflected part of speech category, a feature structure is provided for each inflection.
- 3.
- Inflectional suffixes. For each inflected part of speech category, and for each inflectional paradigm characterized by an inflection number (see the Path Inflection path) a suffix is provided for each inflection.
- Inflected categories
- The list of part of speech categories having inflections (the part of speech categories without inflections conventionally receive an inflection number equal to 1). Each inflected category is followed by a number of inflections between parentheses.
For example:N(2) A(4) V(6).
means that words with part of speech N, A, and V respectively have 2, 4, and 6 inflections. - Inflection features
- For each inflected category, and for each inflection, a feature structure is provided. This feature structure is systematically unified with the corresponding inflected lemma.
For example:V[ 1] <head agreement tense> = pastParticiple <head agreement gender> = feminine <head agreement number> = singular. V[ 2] <head agreement tense> = pastParticiple <head agreement gender> = feminine <head agreement number> = plural.
are the feature structures corresponding to the first two inflections of words with part of speech category V. - Inflectional suffixes
- Each lemma whose part of speech category is inflected (see Inflected categories) receives an inflection number (see the Path Inflection path). This number corresponds to a list of inflectional suffixes, one suffix per inflection.
For example:V[ 7] e ee es ees ant er
are the inflectional suffixes correspond to the six inflections of of words with part of speech category V and with inflection number 7.
A suffix is composed of an optional prefix and a string. A prefix is ?n where n is an integer in the range 0-9. A suffix prefixed with ?n is appended to the nth auxiliary stem (see the Path Auxiliary stem path). A suffix without prefix is appended to the main stem.
The following meta-characters are available for describing suffixes: - * A suffix equal to * corresponds to an inexistent inflection.
- ! If a suffix is beginning with !, the last letter of the stem---or the nth auxiliary stem if prefixed by ?n---is duplicated.
DERIVATIONAL MORPHOLOGY
The definition of the derivational system of a natural language in fastr is very similar to the definition of the inflectional system. Derived categories are enumerated, derivational feature structures are provided together with a history, and finally, a list of suffixes is associated with each derivational paradigm.FREQUENT WORDS
In order to improve the efficiency of the parser, frequent (inflected) words can be placed in the language file for being accessed first.The syntax of the frequent word rules is similar to the syntax of single-word rules (see Single-Word Rules in the fastrdata section), except that inflectional features should be provided.
For example:
Word 'la' : <reference> = 13 <cat> = Dd <head agreement gender> = feminine <head agreement number> = singular.is the rule for the French definite determiner 'la' (feminine singular).
MEMORY RULES
Memory rules are non lexicalized rules which are active whatever the lexical items found in the current sentence. Due to their systematic activation, the computational cost of these rules can be important. They should be used with caution.The syntax of the memory rules is similar to the syntax of term rules (see Term Rules in the fastrdata section), except that no lexical anchor is required.
For example:
Rule N1 -> N2 P3 N4: <P3 lemma> = 'of' <N1 head> = <N2 head> <N1 label> = 'NofN.is a rule for extracting sequences with a Noun-'of'-Noun structure.
META-RULES
Meta-rules are transformations which apply to term rules in order to produce term variant rules. Each meta-rule is dedicated to a specific term structure and to a specific type of linguistic transformation.- Default meta-rules
- In order to minimize the application of meta-rules, meta-rules only apply to term rules whose meta-label value is equal to their own meta-label value (see the Path Meta label path). For the sake of simplicity, term rules are provided with a default meta-label which is a function of their arity (the number of daughter nodes of their root node).
For example:[2] 'XX' [3] 'XXX' [4] 'XXXX' [5] 'XXXXX'
automatically assigns the meta-labels XX, XXX, XXXX, or XXXXX to the term rules whose arity is 2, 3, 4, or 5.
The syntax of meta-rules is given in the fastrdata section.
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre