HCT::Lang::LexerDriver.3pm

Langue: en

Version: 2009-08-08 (fedora - 01/12/10)

Section: 3 (Bibliothèques de fonctions)

NAME

HCT::Language::LexerDriver - Lexer driver for languages under HCT system.

DESCRIPTION

Lexical analysis or scanning is the process where the stream of characters making up the source program is read from left-to-right and grouped into tokens. Tokens are sequences of characters with a collective meaning. There are usually only a small number of tokens for a programming language: constants (integer, double, char, string, etc.), operators (arithmetic, relational, logical), punctuation, and reserved words.
                        error messages 
                              |              
 source language -> [ LEXICAL ANALYZER ] -> token stream
 
 

The lexical analyzer takes a source program as input, and produces a stream of tokens as output. The lexical analyzer might recognize particular instances of tokens --- lexemes. A lexeme is the actual character sequence forming a token, the token is the general class that a lexeme belongs to. Some tokens have exactly one lexeme (e.g., the > character); for others, there are many lexemes (e.g., integer constants).

CONSTRUCTOR

new ()
Creates a new "HCT::Lang::LexerDriver" object.

DESTRUCTOR

DESTROY ()
Destroy "HCT::Lang::LexerDriver" object.

METHODS

input ([FH])
Returns input object created by "HCT::Std::IO". If FH set, creates a new input object.
skip_whitespace ()
Skips whitespaces. Just removes leading white spaces.
match (EXPR)
If EXPR passed successfully, returns matched value and sets new line position. Else, goes back to the previous position and returns "undef".
linepos ([POS])
If POS set, changes current line position. Returns current line position.

Important: works not directly with current line position ("pos"), but with variable that store previous position value.

linelen ()
Returns current line length.
linenum ()
Returns current line number.
is_eof ()
Returns true if end of file (input), or false.
is_eol ()
Returns true if end of line, or false.
move ()
Returns can we moved forward or no. If line is not ended ("pos" equal to "len"), returns "TRUE", else gets a new line while line is empty. If file is finished returns "FALSE".
is_emptyline ()
Returns "TRUE" if current line is empty.
get_next_line ()
Gets a new line and returns "TRUE" if the new line has been received. Or "FALSE" if "EOF".
scan (PARSER)
The scanner has encoded within it information on the possible sequences of characters that can be contained within any of the tokens it handles. Returns token.
stop ()
Returns "EMPTY_TOKEN" to stop the parser. Could be rewritten if needed, see CDL lexer.
callback ()
Returns parser handler. Will be updated as soon, as "scan" called.
tracer ()
Returns object of tracing created by "HCT::Lang::LexerTracer".
next_token ()
Tries to find corresponding token. Return true if token was found, or false as finish.
match_pattern ()
checkup_available_tokens ()
do_reserve ()
Makes reservation of tokens by calling "reserve_tokens".
reserve_patterns ()
Creates a list of patterns in "patterns". Each element has such fields: tokens name, type and one pattern. That means, we create a sigle element for each new pattern. After creating this list will be sorted by types and patterns.
reserve (TYPE, TOKENS)
Creates an objects of new tokens by using "HCT::Lang::Token" and stores them into the "tokens".
get_token (NAME)
Gets token name and returns token object from "tokens".
shift_token ()
Shifts and returns token from token stack.
push_token ()
Psuh new token to the stack and returns true.

VIRTUAL METHODS

skip_comment ()
Virtual method that provides skip from the comments. Returns true if current position should be skipped.
skip_this ()
Virtual method that provides skip some stuff from the current positions. Returns true or false.
reserve_tokens ()
Virtual method to provide reserve of tokens.

PROBLEMS

Identifiers and keywords

The lexer state solution involves coordination between the lexer and the parser. In particular, the parser must tell the lexer whether in ``this context'' a keyword is expected or an identifier is expected. There are several problems with coordinating the lexer's state with the parser's context. One of the most frequently noted ones is that it makes (multiple token) lookahead more difficult. A related problem occurs if the context is miscommunicated between the parser and the lexer, the lexer may return a keyword when only identifers are expected or return the keyword as an identifier when the keyword was supposed to be treated as a keyword.