Perl - Regular Expressions (Continued)
In a Nutshell - CIW Course Section 2, Part B2, Chapter 1b
Character Classes
A character class, or character group, is a set of characters any one of which if found in the search string will result in a match.
A character class is defined using square, or hard, brackets, such as "[aeiou]" which will match any vowel. Perl has a number of predefined character classes identified by escape sequences. The backslash character "\" when place before a metacharacter, escapes the metacharacter and causes it to be treated as a literal. When place before a literal character it forms an escape sequence as follows:
| Perl EscapeSequences | |
|---|---|
| Escape Sequence | Description |
| \d | A digit, equivalent to [0-9] |
| \D | A non digit, equivalent to [^0-9] |
| \w | An alphnumeric character, equivalent to [a-zA-Z_0-9] |
| \W | A non alphanumeric character, equivalent to [^a-zA-Z_0-9] |
| \s | A whitespace character, equivalent to [\t\n\r\f] |
| \S | A non whitespace character, equivalent to [^\t\n\r\f] |
The caret "^" symbol when used in a character class denotes a negative class, or a not match class.
Perl Assertions
We have already seen the "^" and "$" characters used as anchor point for the beginning and end of the search string. Perl also has escape sequences to represent points within a string, these are known as assertions.
| Perl Assertions | |
|---|---|
| Assertion | Description |
| \b | Matches at word boundary between \w and \W |
| \B | Matches except at word boundary |
| \A | Matches at the beginning of a string |
| \Z | Matches at the end of a string |
| \z | Matches at the end of a string only |
| \G | Matches where previous m//g left off |
How this works may not be readily apparent, it certainly wasn't to me. But a bit of trial and error soon clarified matters:
$strTest = "some word play";
if ($strTest =~ /\bplay/i)
{
print ("Match Found\n");
}
The above match is looking for the word "play", but the assertion "\b" denotes that it should occur at a word boundary, which, in this case it does. So a match is found.
If we supply this code with a new string: "some wordplay", a match is not found. The word "play" is still there, but it is no longer at a word boundary.
Split function
The split function accepts two arguments. A regular expression and a string. The function uses the regular expression to match points in the string where the split(s) should occur. The string is then divided at the split point(s) and returned as individual elements in an array.
$strTest = "some word play";
@words = split(/\b/, $strTest);
print ("@words\n");
This example splits the string at word boundaries with the "\b" assertion. It could, equally, have used the "\s" whitespace escape sequence.
Join function
At first glance the Join function appears very straightforward. It does the opposite of split, it takes an array containing string values and combines them to form a single string. However, the join function accepts two argument: the delimiter string and the array.
@words = ("some", "word", "play");
$strTest = join("-",@words);
print("$strTest\n");
The delimiter argument is not optional - I tried it! In most cases you will use a single space string " " as the delimiter. The above example returns the string: "some-word-play".

