Rechercher une page de manuel

Regexp::Optimizer.3pm

Langue: en

Autres versions - même langue

Version: 2004-12-05 (ubuntu - 07/07/09)

Section: 3 (Bibliothèques de fonctions)

Sommaire

NAME
SYNOPSIS
ABSTRACT
INSTALLATION
DESCRIPTION
- EXPORT
METHODS
CAVEATS
BUGS
PRACTICALITY
SEE ALSO
AUTHOR
COPYRIGHT AND LICENSE

NAME

Regexp::Optimizer - optimizes regular expressions

SYNOPSIS

   use Regexp::Optimizer;
   my $o  = Regexp::Optimizer->new;
   my $re = $o->optimize(qr/foobar|fooxar|foozap/);
   # $re is now qr/foo(?:[bx]ar|zap)/

ABSTRACT

This module does, ahem, attempts to, optimize regular expressions.

INSTALLATION

To install this module type the following:

    perl Makefile.PL
    make
    make test
    make install

DESCRIPTION

Here is a quote from perltodo.

Factoring out common suffices/prefices in regexps (trie optimization) Currently, the user has to optimize ``foo|far'' and ``foo|goo'' into ``f(?:oo|ar)'' and ``[fg]oo'' by hand; this could be done automatically.

This module implements just that.

EXPORT

Since this is an OO module there is no symbol exported.

METHODS

This module is implemented as a subclass of Regexp::List. For methods not listed here, see Regexp::List.

$o = Regexp::Optimizer->new;

$o->set(key => value, ...)

Just the same us Regexp::List except for the attribute below;

unexpand

When set to one, $o->optimize() tries to $o->expand before actually starting the operation.

   # cases you need to set expand => 1
   $o->set(expand => 1)->optimize(qr/
                                    foobar|
                                    fooxar|
                                    foozar
                                    /x);

$re = $o->optimize(regexp);

Does the job. Note that unlike "->list2re()" in Regexp::List, the argument is the regular expression itself. What it basically does is to find groups will alterations and replace it with the result of "$o->list2re".

$re = $o->list2re(list of words ...)

Same as "list2re()" in Regexp::List in terms of functionality but how it tokenize ``atoms'' is different since the arguments can be regular expressions, not just strings. Here is a brief example.

   my @expr = qw/foobar fooba+/;
   Regexp::List->new->list2re(@expr) eq qr/fooba[\+r]/;
   Regexp::Optimizer->new->list2re(@expr) eq qr/foob(?:a+ar)/;

CAVEATS

This module is still experimental. Do not assume that the result is the same as the unoptimized version.

•: When you just want a regular expression which matches normal words with not metacharacters, use <Regexp::List>. It's more robus and much faster.
•: When you have a list of regular expessions which you want to aggregate, use "list2re" of THIS MODULE.
•: Use "->optimize()" when and only when you already have a big regular expression with alterations therein.
"->optimize()" does support nested groups but its parser is not tested very well.

BUGS

•

Regex parser in this module (which itself is implemented by regular expression) is not as thoroughly tested as Regexp::List

•

May still fall into deep recursion when you attempt to optimize deeply nested regexp. See ``PRACTICALITY''.

•

Does not grok (?{expression}) and (?(cond)yes|no) constructs yet

•

You need to escape characters in character classes.

   $o->optimize(qr/[a-z()]|[A-Z]/);              # wrong
   $o->optimize(qr/[a-z\(\)]|[A-Z]/);            # right
   $o->optimize(qr/[0-9A-Za-z]|[\Q-_.!~*"'()\E]/ # right, too.

•

When character(?: class(?:es)?)? are aggregated, duplicate ranges are left as is. Though functionally OK, it is cosmetically ugly.

   $o->optimize(qr/[0-5]|[5-9]|0123456789/);
   # simply turns into [0-5][5-9]0123456789] not [0-9]

I left it that way because marking-rearranging approach can result a humongous result when unicode characters are concerned (and \p{Properties}).

PRACTICALITY

Though this module is still experimental, It is still good enough even for such deeply nested regexes as the followng.

   # See 3.2.2 of  http://www.ietf.org/rfc/rfc2616.txt
   # BNF faithfully turned into a regex
   http://(?:(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|(?:(?:[a-z]|[A-Z])|[0-9])(?:(?:(?:[a-z]|[A-Z])|[0-9])|-)*(?:(?:[a-z]|[A-Z])|[0-9]))\.)*(?:(?:[a-z]|[A-Z])|(?:[a-z]|[A-Z])(?:(?:(?:[a-z]|[A-Z])|[0-9])|-)*(?:(?:[a-z]|[A-Z])|[0-9]))\.?|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)(?::[0-9]*)?(?:/(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*(?:;(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*)*(?:/(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*(?:;(?:(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f])|[:@&=+$,])*)*)*(?:\\?(?:[;/?:@&=+$,]|(?:(?:(?:[a-z]|[A-Z])|[0-9])|[\-\_\.\!\~\*\'\(\)])|%(?:[0-9]|[A-Fa-f])(?:[0-9]|[A-Fa-f]))*)?)?
 
   # and optimized
   http://(?::?[a-zA-Z0-9](?:[a-zA-Z0-9\-]*[a-zA-Z0-9])?\.[a-zA-Z]*(?:[a-zA-Z0-9\-]*[a-zA-Z0-9])?\.?|[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)(?::[0-9]*)?(?:/(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*(?:;(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*)*(?:/(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*(?:;(?:(?:(?:[a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f])|[:@&=+$,]))*)*)*(?:\\?(?:(?:[;/?:@&=+$,a-zA-Z0-9\-\_\.\!\~\*\'\x28\x29]|%[0-9A-Fa-f][0-9A-Fa-f]))*)?)?

By carefully examine both you can find that character classes are properly aggregated.

AUTHOR

Dan Kogai <dankogai@dan.co.jp>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

Regexp::Optimizer.3pm

Sommaire

NAME

SYNOPSIS

ABSTRACT

INSTALLATION

DESCRIPTION

EXPORT

METHODS

CAVEATS

BUGS

PRACTICALITY

SEE ALSO

AUTHOR

COPYRIGHT AND LICENSE

Découvrir

Apprendre

Linux Certif

Toute la documentation sur la certification Linux LPI

Rechercher une page de manuel

Regexp::Optimizer.3pm

Sommaire

NAME

SYNOPSIS

ABSTRACT

INSTALLATION

DESCRIPTION

EXPORT

METHODS

CAVEATS

BUGS

PRACTICALITY

SEE ALSO

AUTHOR

COPYRIGHT AND LICENSE

Découvrir

Apprendre

Partager