Xml_lexer.3o

Langue: en

Version: 274578 (debian - 07/07/09)

Section: 3 (Bibliothèques de fonctions)

NAME

Xml_lexer - Simple XML lexer

Module

Module Xml_lexer

Documentation

Module Xml_lexer
 :  sig end

Simple XML lexer

=== This module provides an ocamllex lexer for XML files. It only supports the most basic features of the XML specification. The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration. The predefined entities (&, <, etc.) are supported. The replacement text for other entities whose entity value consist of character data can be provided to the lexer (see Xml_lexer.entities). Internal entities declarations are not taken into account (the lexer just skips the doctype declaration). CDATA sections and character references are supported. See Xml_lexer.strip_ws about whitespace handling. ===

=== Error reporting ===

type error =
 | Illegal_character of char
 | Bad_entity of string
 | Unterminated of string
 | Tag_expected
 | Attribute_expected
 | Other of string
 

val error_string : error -> string

exception Error of error * int

This exception is raised in case of an error during the parsing. The int argument indicates the character position in the buffer. Note that some non-conforming XML documents might not trigger an error.

=== API ===

type token =
 | Tag of string * (string * string) list * bool (* Tag (name, attributes, empty) denotes an opening tag          with the specified
name and attributes . If empty ,          then the tag ended in "/>", meaning that it has no
         sub-elements. *)

 | Chars of string (* Some text between the tags *)
 | Endtag of string (* A closing tag *)
 | EOF  (* End of input *)
 

The type of the XML document elements

val strip_ws : bool Pervasives.ref

Whitespace handling: if strip_ws is true (the default), whitespaces next to a tag are ignored. Character data consisting only of whitespaces is thus suppressed (i.e. Chars tokens are skipped).

val entities : (string * string) list Pervasives.ref

An association list of entities definitions. Initially, it contains the predefined entities ( [amp, &; lt, < ...] ).

val token : Lexing.lexbuf -> token

The entry point of the lexer.

Raises Error in case of an invalid XML document

Returns the next token in the buffer