Perl - Regular Expressions
In a Nutshell - CIW Course Section 2, Part B2, Chapter 1
Regular Expressions can appear quite daunting when you first come across them. With a little patience and plenty of practise they will make sense and the power they unleash will become apparent. Regular Expressions are not peculiar to Perl, they are used in many programming languages and scripting languages.
Pattern Binding Operators
There are two basic pattern matching operators: "=~" and "!~". The first returns true if the left operand matches the supplied pattern, the second returns true if the left operand does not match the supplied pattern.
$strTest = "she sells seashells on the sea shore";
if ($strTest =~ m/shells/)
{
print ("Match Found\n");
}
The "m" indicates matching, and the text delimited by "/ /" is the pattern to match. The more generic syntax is shown below:
m/pattern/[gix]
The characters after the "/ /" are modifiers which affect the output of the match.
| Modifiers | |
|---|---|
| Modifier | Description |
| g | Matches globally - finds all occurrences |
| i | Performs case-insensitive pattern matching |
| x | Uses extended regular expressions |
The leading "m" is optional when using the "/" as a delimiter. However, it seems, that other characters may be used as delimiters. "/pattern/", "m/pattern/", and "m#pattern#" are all valid. When using these alternative delimiters the "m" character must be included.
Metacharacters
Metacharacters are characters that have special meaning within the pattern string.
| Perl Metacharacters | |
|---|---|
| Metacharacter | Function |
| \ | Used to escape characters |
| . | Matches any single character except newline when the /s modifier is being used |
| ^ | Anchors to the beginning of string |
| $ | Anchors to the end of the string |
| * | Matches the preceding element zero or more times |
| + | Matches the preceding element one or more times |
| ? | Matches the preceding element zero or one times |
| { } | Denotes a range of occurences for the element preceding it |
| [ ] | Creates a character class and matches any character within the brackets |
| ( ) | Used to group regular expressions |
| | | Matches expression either preceding or following it |
The backslash "\" when placed before a metacharacter escapes this character so that it is treated as a literal character.
The period or fullstop "." matches any single character so "/a.d/" matches "and" or "aid" but not "arid".
The caret "^" symbol anchors the pattern to the beginning of the search string so "/^fred/" matches "fred elliot" but not "my name is fred"
Conversely the dollar "$" symbol anchors the pattern to the end of the search string so "/fred$/" does not match "fred elliot" but does match "my name is fred".
The star "*" symbol matches the preceding character zero or more times, so "/ab*c/" matches "abc", "abbc", and "ac".
The plus "+" symbol matches the preceding character one or more times, so "/ab+c/" matches "abc", "abbc", but not "ac".
The vertical bar "|" provides for alternative matches. So "/Lesl(ie|ey)/" will match both "Leslie" and "Lesley".
Square brackets create a character class, or group of characters. The example "/^[aA]/" will match any string beginning with the letter "A" regardless of case.
This is quite a lengthy chapter so I have split it across two pages in the hope that this will make it more manageable.

