Other Regex Features

Sometimes you want to call other regexes, but don't want them to capture the matched text. When parsing a programming language you might discard whitespace characters and comments. You can achieve that by calling the regex as <.otherrule>.

If you use the :sigspace modifier, every continuous piece of whitespace calls the built-in rule <.ws>. This use of a rule rather than a character class allows you to define your own version of whitespace characters (see ).

Sometimes you just want to peek ahead to check if the next characters fulfill some properties without actually consuming them. This is common in substitutions. In normal English text, you always place a whitespace after a comma. If somebody forgets to add that whitespace, a regex can clean up after the lazy writer:

    my $str = 'milk,flour,sugar and eggs';
    say $str.subst(/',' <?before \w>/, ', ',  :g);
    # output: milk, flour, sugar and eggs

The word character after the comma is not part of the match, because it is in a look-ahead introduced by <?before ... >. The leading question mark indicates an zero-width assertion: a rule that never consumes characters from the matched string. You can turn any call to a subrule into an zero width assertion. The built-in token <alpha> matches an alphabetic character, so you can rewrite this example as:

    say $str.subst(/',' <?alpha>/, ', ',  :g);

An leading exclamation mark negates the meaning, such that the lookahead must not find the regex fragment. Another variant is:

    say $str.subst(/',' <!space>/, ', ',  :g);

You can also look behind to assert that the string only matches after another regex fragment. This assertion is <?after>. You can write the equivalent of many built-in anchors with look-ahead and look-behind assertions, though they won't be as efficient.

Таблица 9.3. Emulation of anchors with look-around assertions

AnchorMeaningEquivalent Assertion
^start of string<!after .>
^^start of line<?after ^ | \n >
$end of string<!before .>
> >right word boundary<?after \w> <!before \w>