Rechercher une page de manuel
Sprog::help::regex_intro.3pm
Langue: en
Version: 2005-05-30 (fedora - 01/12/10)
Section: 3 (Bibliothèques de fonctions)
Sommaire
An Introduction to Perl Regular Expressions
A 'regular expression' is just a fancy name for a pattern.You're probably already familiar with filename patterns like '"*.doc"'. Perl regular expressions are similar but much more powerful. Here's a brief introduction and some examples to get you started.
Note: 'regular expression' is often abbreviated to 'regex'.
Plain Text
The simplest pattern is just plain text. For example, this pattern ...cat
... will match any line containing a 'c' followed by an 'a' followed by a 't'. So all these lines would match:
The cat sat on the mat He picked up the catcher's mitt She scattered rose petals in the wind
But this line ...
Cat Doors - $35 each
... would not match unless you checked the 'ignore case' box.
Note: you don't need to say '"*cat*"' and in fact it would be a mistake to do so. We'll get to stars shortly.
Character Classes
You can use square brackets to specify a list of characters that should match. For example, this pattern ...[bcr]at
... will match any line containing either a 'b', a 'c' or an 'r' followed by an 'a' followed by a 't'. So all these lines would match:
The cat sat on the mat He picked up the bat The gazed into the crater
But these would not:
The hat sat on the mat canned flat bacon rashers
You can use '-' to specify a range of characters. For example, this pattern will match any of the first ten letters of the (english) alphabet:
[a-j]
If the very first character in the square brackets is a '^' then it will match any character this is not in the list. For example:
[^bc]at
Will match 'hat' and 'mat' but not 'bat' or 'cat'.
Predefined Character Classes
There are a number of predefined character classes:\d [0-9] any digit \s [ \t\r\n\f] any 'whitespace' character \w [a-zA-Z0-9_] any letter, digit or underscore \D [^0-9] anything except a digit \S [^\s] anything except whitespace \W [^\w] anything except a letter, digit or underscore
So this pattern ...
\d\dth
... would match any line containing two digits followed by 'th', such as:
the 19th hole
The '.' can be thought of as a class which matches any character. So for example '"p.t"' would match 'pet', 'pot', 'p8t' and 'p@t'. It would not match 'pt'.
Matching Multiple Times
If you wanted to match any line containing 5 consecutive 'W' characters, you could use this:WWWWW
or this:
W{5}
If you wanted to match 'bet' and 'beet' but not 'beeet', you could use this:
be{1,2}t
Note: the number or numbers in the curly brackets only apply to the character or character class immediately before the '{'. In the above example it will match at least 1 but no more than 2 consecutive 'e's.
Often you want to include an optional character in a match, you could match on zero or one occurences:
pots{0,1}
which will match 'pot' or 'pots' (and also 'potscrub'). This is so common it has a shorthand form - just replace '"{0,1}"' with '?' which means exactly the same thing:
pots?
There are two other important shorthand forms. '*' mean 0 or more so this:
po*t
is the same as:
po{0,}t
which will match 'pt', 'pot', 'poot', etc.
The other import form is '+' meaning one or more matches:
po+t
will match 'pot', 'poot', etc.
Anchoring a Match
If the very first character in your pattern is '^' then what follows is 'anchored' to the start of the line. For example ...^pot
... will match these lines:
potato potentially
but not these:
a pot-plant spotted
Similarly, '$' can be used to anchor to the end of the line, so ...
end$
... will match:
Max is my friend
but not:
My friend is Max
You can use both '^' and '$' in the same pattern. Here for example is a pattern that matches lines which start with a capital letter and end with an exclamation point:
^[A-Z].*!$
Another useful technique is to use '"\b"' to anchor a match to the beginning or end of a word (think of '"\b"' as matching the boundary between a non-word character such as a space and a word character such as a letter). For example ...
\bcat
... will match words that start with 'cat' such as 'cat' or 'catch' but not 'scatter'. And this ...
\bcat\b
... will only match 'cat' not 'catch' or 'scat'.
More Information
That's far more information than you can be expected to absorb in one sitting, so we'll stop there. The best way to become proficient with regular expressions is to use them and Sprog is an ideal tool for trying out different patterns and different data.This quick introduction has only scratched the surface of what's possible in a Perl regular expression. The official Perl regular expression tutorial is at: <http://perldoc.perl.org/perlretut.html>.
A web search will locate many helpful pages: <http://www.google.com/search?q=perl+regular+expression+tutorial>.
There are also a number of books on the subject.
Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre