Rechercher une page de manuel
sgmltoken
Langue: en
Version: 113496 (mandriva - 01/05/08)
Section: 1 (Commandes utilisateur)
Sommaire
NAME
sgmltoken - a sample LT NSL program for tokenising the text in anSYNOPSIS
usage: sgmltoken [-d ddb-file] [-u base-url] [input-file]DESCRIPTION
The material below may be out of date: consult LT XML documentation please.The input file to sgmltoken is an nSGML file which contains <TEXT...> and <P> elements. All text inside such <TEXT...> elements will be tokenised into 'words' and punctuation represented by <C> elements.
DESCRIPTION: Input/Output
Description of the input/output files involved in this program.- Input ==> An nSGML file : [<filename> or stdin]
- Output ==> An nSGML file containing <C> elements for the 'words',
- in addition to the existing markup: [stdout]
EXPECTED DOCTYPE of INPUT
The <!DOCTYPE> for the input file should contain at least the following:<!element text - - (#PCDATA|c|w|s|p)*>
<!element p - - (#PCDATA|c|w|s)*>
<!element s - - (w)*>
<!attlist s
id ID #IMPLIED>
<!element w - - (c)*>
<!attlist w
id ID #IMPLIED
type CDATA #IMPLIED
lemma CDATA #IMPLIED>
<!element c - - (#PCDATA)*>
<!attlist c
id ID #IMPLIED
rend CDATA #IMPLIED>
OPTIONS
- -d <ddbfile>
- is the name of a file containing a representation of a DTD. Can be used if the DTD is not specified (in a <?NSL DDB ...> statement) in the input document iself.
SEE ALSO
ltxml(1), mknsg(1), sgmlsb(1)AUTHOR
Henry Thompson (ht@cogsci.ed.ac.uk)David McKelvie (dmck@cogsci.ed.ac.uk)
Language Technology Group, Human Communication Research Centre, Edinburgh University,
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND
Tel:(44) 131 650-4630
Fax:(44) 131 650-4587 email: dmck@cogsci.ed.ac.uk
Comments, suggestions, and bug reports are always welcome.
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre