Named regexes

You can declare regexes just like subroutines--and even name them. Suppose you found the example at the beginning of this chapter useful and want to make it available easily. Suppose also you want to extend it to handle contractions such as doesn't or isn't:

    my regex word { \w+ [ \' \w+]? }
    my regex dup  { « <word=&word> \W+ $<word> » }

    if $s ~~ m/ <dup=&dup> / {
        say "Found '{$<dup><word>}' twice in a row";

This code introduces a regex named word, which matches at least one word character, optionally followed by a single quote. Another regex called dup (short for duplicate) contains a word boundary anchor.

Within a regex, the syntax <&word> locates the regex word within the current lexical scope and matches against the regex. The <name=&regex> syntax creates a capture named name, which records what &regex matched in the match object.

In this example, dup calls the word regex, then matches at least one non-word character, and then matches the same string as previously matched by the regex word. It ends with another word boundary. The syntax for this backreference is a dollar sign followed by the name of the capture in angle brackets[12].

Within the if block, $<dup> is short for $/{'dup'}. It accesses the match object that the regex dup produced. dup also has a subrule called word. The match object produced from that call is accessible as $<dup><word>.

Named regexes make it easy to organize complex regexes by building them up from smaller pieces.

[12] In grammars--see ()--<word> looks up a regex named word in the current grammar and parent grammars, and creates a capture of the same name.