download-entities.pl

Langue: en

Version: 2008-09-30 (fedora - 01/12/10)

Section: 1 (Commandes utilisateur)

NAME

download-entities - download and parse XML Entity definitions

SYNOPSIS

  $ perl download-entities.pl -i # interactive
  $ perl download-entities.pl > output-file.pm
  $ perl download-entities.pl output-file.pm
  
  # instead of http://www.w3.org/2003/entities/iso9573-2003/
  $ perl download-entities.pl http://my.server.com/entities.html
 
 

DESCRIPTION

This script downloads the definitions of XML entities from http://www.w3.org/2003/entities/iso9573-2003/ or from whatever address you give it as an argument. The argument should be an URL (that LWP::UserAgent::get can access) pointing to a document with (absolute or relative) references to files ending with the ".ent" suffix. These files are expected to be DTD's with lines like
  <!ENTITY amp "&#38;" >
 
 

The script parses these files and prints the perl module to the standard output. If you wish, you can give ``file'' as another argument to the script and it will then print it to ``file''. You can also specify the output file in the environment variable "OUTPUT_FILE".

The index and the output file are distinguished by the presence of ``://'' substring. If you want to use a locally stored index file (the one with the .ent references), you can access it by saying

  perl download.pl file:///path/to/index.html
 
 

Note that the script currently distinguishes between relative and absolute paths by looking at whether the href contains a ``://'' substring. This can lead to crashes when the links look like href=``/path/file.ent''.

Also, the script assumes the links have exactly the format href=``...'' - with double quotes.

Interactive download

In case you run into problems downloading the documents, you can try to run the script with the "-i" or "--interactive" option. This will let you skip downloads or enter alternative URLs for individual documents.

The interactive mode is also triggered when the "INTERACTIVE" environment variable is set to a true value (in Perl sense).

Options

Beside the "--interactive" option, this script also accepts the "--timeout" option. It specifies the timeout for LWP::UserAgent in seconds when downloading. The same is controlled by the "DOWNLOAD_TIMEOUT" environment variable. The defaule (180s) timeout is used when not specified.
  # 10 seconds timeout - croak on failure
  perl download-entities.pl --timeout 10 > XML/Entities/Data.pm
  # 5 seconds timeout - croak on failure
  DOWNLOAD_TIMEOUT=5 perl download-entities.pl > XML/Entities/Data.pm
  # 1 second timeout - ask on failure
  perl download-entities.pl --interactive --timeout 1 > XML/Entities/Data.pm