fastrlang

Langue: en

Version: fastr language 2.04 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

fastrlang - how to set linguistic parameters

DESCRIPTION

Upon startup, fastr compiles a language file given by the field Language file of the configuration file (see the fastrconf section).

The language file is composed of fields and values of seven categories:

1.
Features. Names of labels used in the definition of linguistic properties.
2.
Paths. Paths for accessing specific linguistic parameters in the definition of linguistic properties.
3.
Characters. Characters used for segmenting the input text (punctuations, etc.).
4.
Morphology. Definition of the inflectional and derivational system.
5.
Frequent words. Inflected words stored in memory during parsing.
6.
Memory rules. Syntactic rules stored in memory during parsing.
7.
Meta-rules. Syntactic transformations operating on term rules.

FEATURES

Categories
The list of part of speech categories which are used in the definition of linguistic properties. The four specific categories Unknown category, Numeral, Punctuation, and Final punctuation may not be listed here.
Unknown category
Category automatically assigned to unknown words.
Major categories
Categories of the words that cannot be deleted in variants in case of multiple lexicalisation (see the Field Multiple lexicalization in the fastrconf section).
Numeral
Category automatically assigned to numerals if the switch Number preprocessing of the configuration file is on (see the Field Number preprocessing in the fastrconf section).
Punctuation
Category automatically assigned to punctuation characters (see the Characters Punctuation).
Final punctuation
Category automatically assigned to final punctuation characters (see the Characters Final punctuation).
Features
Symbols used for creating paths and thereby feature structures in the definition of linguistic properties.
Singular
The value of the Number path of singular words (see the Path Number path).
Plural
The value of the Number path of plural words (see the Path Number path).

PATHS

Category path
The value of this path is the part of speech category.
Label path
The value of this path is the label of a non-terminal leave in a term rule (see the Constraints of Term Rules in the fastrdata section). This value is reported on output when a rule is correctly parsed.
Lexicalization path
The value of this path is the label of the lexical anchor in the Context-Free Skeleton of a term rule (see the Constraints of Term Rules in the fastrdata section).
Lemma path
The value of this path is the lemma of a single word in a single-word rule, and the lemma of a lexical leave in a term rule or a meta-rule (see the Constraints of Term Rules or Meta-Rules in the fastrdata section).
Inflection path
The value of this path is the inflection number characterizing the morphological inflections of a lemma (see the Suffixes of Morphology).
Derivation path
The value of this path is the derivation number characterizing the morphological derivations of a lemma (see the Suffixes of Morphology).
Form path
The value of this path the form of an inflected word in a text (typography, followed by an hyphen or a quote, etc.).
Number path
The value of this path is the number (singular or plural) of an inflected word (see the Features Singular and Plural).
Reference path
The value of this path is the unique numerical identifier of a lemma.
Meta label path
The value of this path is the label of a meta-rule. This label is used for speeding up the application of meta-rules by letting rules select meta-rules (see the Constraints of Meta-Rules in the fastrdata section).
Auxiliary stem path
The value of this path is the list of the auxiliary stems of a lemma.
Root path
This path is used to carry the link to the root of a derived word.
Reg expression path
The value of this path is the regular expression preceding a leaf node in a Context-Free Skeleton of a meta-rule (see the Context-Free Skeleton of Meta-rules).
Semantic path(s)
These paths are used to carry the links to semantically related words.
Self path
This path is used to carry the link of a word to itself.

CHARACTERS

Non alpha-numerical
Alpha-numerical characters allowed within the string of a word. Other non-alpha-numerical characters (outside the ranges a-z, A-Z, and 0-9) are considered as word delimiters.
Punctuation
Punctuation characters (see the Feature Punctuation).
Final punctuation
Punctuation characters which are considered as sentence delimiters (see the Characters Punctuation).

INFLECTIONAL MORPHOLOGY

The definition of the inflectional system of a natural language in fastr consists of three parts:
1.
Inflected categories. The list of the inflected part of speech categories with their number of inflections (see the Feature Categories).
2.
Inflection features. For each inflected part of speech category, a feature structure is provided for each inflection.
3.
Inflectional suffixes. For each inflected part of speech category, and for each inflectional paradigm characterized by an inflection number (see the Path Inflection path) a suffix is provided for each inflection.
Inflected categories
The list of part of speech categories having inflections (the part of speech categories without inflections conventionally receive an inflection number equal to 1). Each inflected category is followed by a number of inflections between parentheses.
For example:
 
     N(2) A(4) V(6).
 
 
means that words with part of speech N, A, and V respectively have 2, 4, and 6 inflections.
Inflection features
For each inflected category, and for each inflection, a feature structure is provided. This feature structure is systematically unified with the corresponding inflected lemma.
For example:
 
     V[ 1]
         <head agreement tense> = pastParticiple
         <head agreement gender> = feminine
         <head agreement number> = singular.
     V[ 2]
         <head agreement tense> = pastParticiple
         <head agreement gender> = feminine
         <head agreement number> = plural.
 
 
are the feature structures corresponding to the first two inflections of words with part of speech category V.
Inflectional suffixes
Each lemma whose part of speech category is inflected (see Inflected categories) receives an inflection number (see the Path Inflection path). This number corresponds to a list of inflectional suffixes, one suffix per inflection.
For example:
 
     V[ 7] e  ee  es  ees  ant  er 
 
 
are the inflectional suffixes correspond to the six inflections of of words with part of speech category V and with inflection number 7.


A suffix is composed of an optional prefix and a string. A prefix is ?n where n is an integer in the range 0-9. A suffix prefixed with ?n is appended to the nth auxiliary stem (see the Path Auxiliary stem path). A suffix without prefix is appended to the main stem.


The following meta-characters are available for describing suffixes:

* A suffix equal to * corresponds to an inexistent inflection.
! If a suffix is beginning with !, the last letter of the stem---or the nth auxiliary stem if prefixed by ?n---is duplicated.

DERIVATIONAL MORPHOLOGY

The definition of the derivational system of a natural language in fastr is very similar to the definition of the inflectional system. Derived categories are enumerated, derivational feature structures are provided together with a history, and finally, a list of suffixes is associated with each derivational paradigm.

FREQUENT WORDS

In order to improve the efficiency of the parser, frequent (inflected) words can be placed in the language file for being accessed first.
The syntax of the frequent word rules is similar to the syntax of single-word rules (see Single-Word Rules in the fastrdata section), except that inflectional features should be provided.
For example:
 
     Word 'la' :
         <reference> = 13
         <cat> = Dd
         <head agreement gender> = feminine
         <head agreement number> = singular.
 
 
is the rule for the French definite determiner 'la' (feminine singular).

MEMORY RULES

Memory rules are non lexicalized rules which are active whatever the lexical items found in the current sentence. Due to their systematic activation, the computational cost of these rules can be important. They should be used with caution.
The syntax of the memory rules is similar to the syntax of term rules (see Term Rules in the fastrdata section), except that no lexical anchor is required.
For example:
 
     Rule N1 -> N2 P3 N4:
         <P3 lemma> = 'of'
         <N1 head> = <N2 head>
         <N1 label> = 'NofN.
 
 
is a rule for extracting sequences with a Noun-'of'-Noun structure.

META-RULES

Meta-rules are transformations which apply to term rules in order to produce term variant rules. Each meta-rule is dedicated to a specific term structure and to a specific type of linguistic transformation.
Default meta-rules
In order to minimize the application of meta-rules, meta-rules only apply to term rules whose meta-label value is equal to their own meta-label value (see the Path Meta label path). For the sake of simplicity, term rules are provided with a default meta-label which is a function of their arity (the number of daughter nodes of their root node).
For example:
 
     [2]     'XX'
     [3]     'XXX'
     [4]     'XXXX'
     [5]     'XXXXX'
 
 
automatically assigns the meta-labels XX, XXX, XXXX, or XXXXX to the term rules whose arity is 2, 3, 4, or 5.

The syntax of meta-rules is given in the fastrdata section.