Captures

Regex can be very useful for extracting information too. Surrounding part of a regex with round brackets (aka parentheses) (...) makes Perl capture the string it matches. The string matched by the first group of parentheses is available in $/[0], the second in $/[1], etc. $/ acts as an array containing the captures from each parentheses group:

    my $str = 'Germany was reunited on 1990-10-03, peacefully';

    if $str ~~ m/ (\d**4) \- (\d\d) \- (\d\d) / {
        say 'Year:  ', $/[0];
        say 'Month: ', $/[1];
        say 'Day:   ', $/[2];
        # usage as an array:
        say $/.join('-');       # prints 1990-10-03
    }

If you quantify a capture, the corresponding entry in the match object is a list of other match objects:

    my $ingredients = 'eggs, milk, sugar and flour';

    if $ingredients ~~ m/(\w+) ** [\,\s*] \s* 'and' \s* (\w+)/ {
        say 'list: ', $/[0].join(' | ');
        say 'end:  ', $/[1];
    }

This prints:

    list: eggs | milk | sugar
    end:  flour

The first capture, (\w+), was quantified, so $/[0] contains a list of words. The code calls .join to turn it into a string. Regardless of how many times the first capture matches (and how many elements are in $/[0]), the second capture is still available in $/[1].

As a shortcut, $/[0] is also available under the name $0, $/[1] as $1, and so on. These aliases are also available inside the regex. This allows you to write a regex that detects that common error of duplicated words, just like the example at the beginning of this chapter:

    my $s = 'the quick brown fox jumped over the the lazy dog';

    if $s ~~ m/ « (\w+) \W+ $0 » / {
        say "Found '$0' twice in a row";
    }

The regex first anchors to a left word boundary with « so that it doesn't match partial duplication of words. Next, the regex captures a word ((\w+)), followed by at least one non-word character \W+. This implies a right word boundary, so there is no need to use an explicit boundary. Then it matches the previous capture followed by a right word boundary.

Without the first word boundary anchor, the regex would for example match strand and beach or lathe the table leg. Without the last word boundary anchor it would also match the theory.