Rechercher une page de manuel
Regexp::List.3pm
Langue: en
Version: 2004-12-05 (ubuntu - 07/07/09)
Section: 3 (Bibliothèques de fonctions)
Sommaire
NAME
Regexp::List - builds regular expressions out of a list of wordsSYNOPSIS
use Regexp::List; my $l = Regexp::List->new; my $re = $l->list2re(qw/foobar fooxar foozap fooza/); # $re is now qr/foo(?:[bx]ar|zap?)/
ABSTRACT
This module offers "list2re" method that turns a list of words into an optimized regular expression which matches all words therein.The optimized regular expression is much more efficient than a simple-minded '|'-concatenation thereof.
DESCRIPTION
This module use Object-Oriented approach so you can use this module as a base and tweak its features. This module is a base class of Regexp::Optimizer.EXPORT
Since this is an OO module there is no symbol exported.METHODS
This module offers methods below;- $l = Regexp::List->new(key=>value, ...)
- Constructor. When arguments are fed in key => value, manner, it sets attributes. See "$l->set" for details
- $re = $l->list2re(list of words ...)
- Does the job. Takes a list of words and turn it into an optimal regular expresson. See ``IMPLEMENTATION'' to find out how it is achieved. If you want to know the underlying black magic even further, see the source.
- $l->set(key => value, ...)
- Sets attributes. There are many attributes supported but let me mention just a few that you may be interested.
-
- lookahead
- Whether you prepend a lookahead assertion or not. Default value is 1. This module is smart enough to omit the assertion when you don't need one.
$re = $l->list2re(qw/1 2 3 infinity/); # qr/(?=[123i])(?:[123]|infinity)/ $re = $l->set(lookahead=>0)->list2re(qw/1 2 3 infinity/); # qr/(?:[123]|infinity)/
- quotemeta
- Whether you quote metacharacters or not. Default is 1. If you really need this feature try Regexp::Optimizer instead.
@list = qw/3 3.14 3.14159265358979/; $re = $l->list2re(@list); # qr/3(?:\.14(?:159265358979)?)?)/ $re = $l->set(lookahead=>0)->list2re(@list); # qr/3(?:.14(?:159265358979)?)?)/ # which does match 3.14 but also "11+3=14"
- modifies
- Currently it accepts 'i', 'm', 's', and 'x', the same as regular expression modifiers.
@list = qw/Perl perl BASIC basic/; $re = $l->list2re(@list); # qr/(?=[BPbp])(?:[Pp]erl|BASIC|basic)/ $re = $l->set(modifiers => 'i')->list2re(@list); # qr/(?=[bp])(?:perl|basic)/i print $l->set(modifiers => 'x')->list2re(@list); # Try for yourself; Isn't itcute ?
-
- $l->expand($re);
- Utility methods to expand a regular expression. Handy when you want to check the complex regexes.
- $l->unexpand($re);
- Utility methods to unexpand a regular expression.
IMPLEMENTATION
This module optimizes the regular expression as follows. Let's see what happens when qw/foobar fooxar foozap fooza/ is fed- trie building (common prefix aggregation)
- first the corresponding trie structure is built
+- bar foo -+- xar +- za -+- p +- ''
- common suffix aggregation
- it checks if any leaf node can be optimized for common suffix. In this case we can do so to ``bar'' and ``xar''.
+- b -+-ar foo -+- x -+ +- za -+- p +- ''
- character class conversion
- If a branch contains more than two single characters, it turns it into a character class.
foo -+- [bx] --- ar +- za -+-p +- ''
- empty leaf to "?"
- Empty leaf is converted to a '?' quantifier
foo -+- [bx] --- ar +- za -+-p?
- join all leafs into a group
- And the final result is reached.
foo(?:[bx]ar|zap?)
BENCHMARKS
This module is faily robust. You can practically use this module to find a regular expression that matches all words in a dictionary. Here is a result by on perl 5.8.0, FreeBSD 4-Stable, Pentium III 800 Mhz with 512 MB RAM.# Sat May 31 09:11:06 2003 ( 0.000000 s) Reading /usr/share/dict/words # Sat May 31 09:11:07 2003 ( 0.847797 s) 235881 lines read. # Sat May 31 09:11:07 2003 ( 0.000000 s) Making regexp. # Sat May 31 09:13:09 2003 ( 121.596928 s) Done. # Sat May 31 09:13:09 2003 ( 0.000000 s) Saving to t/words.rx # Sat May 31 09:13:09 2003 ( 0.000000 s) Reading t/words.rx # Sat May 31 09:13:13 2003 ( 3.679176 s) Done. # Sat May 31 09:13:13 2003 ( 0.000000 s) Opening /usr/share/dict/words for comparison. # Sat May 31 09:13:13 2003 ( 0.255222 s) /usr/share/dict/words:235881 lines found. # Sat May 31 09:13:13 2003 ( 0.000000 s) Showtime! # 235881/235881 # Sat May 31 10:44:17 2003 ( 5464.370409 s) Done. # Sat May 31 10:44:17 2003 ( 5464.370624 s) 43.167 matches/s
The result of optimization is obvious as the number of alteration increases. Here is a result of a benchmark which matches randomly picked words against "/usr/share/dict/words".
==== 2 words Rate naive optim naive 1.79/s -- -28% optim 2.49/s 39% -- ==== 256 words s/iter naive optim naive 31.7 -- -81% optim 5.95 433% --
SEE ALSO
Regexp::Optimizer --- uses this module as its base"eg/" directory in this package contains example scripts.
- Perl standard documents
- perltodo, perlre
- CPAN Modules
- Regexp::Presuf, Text::Trie
- Books
- Mastering Regular Expressions <http://www.oreilly.com/catalog/regex2/>
AUTHOR
Dan Kogai <dankogai@dan.co.jp>COPYRIGHT AND LICENSE
Copyright 2003 by Dan Kogai, All Rights Reserved.This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre