NAME
Regexp::Fields - named capture groups
SYNOPSIS
use Regexp::Fields qw(my);
use strict;
my $rx = qr/Time: (?<hrs>..):(?<min>..):(?<sec>..)/;
if (/$rx/) {
print "The time was: $&{hrs}:$&{min}:$&{sec}\n";
# or: "The time was: $hrs:$min:$sec\n";
# or: "The time was: $1:$2:$3\n";
}
DESCRIPTION
Regexp::Fields
adds the extended (?<name> ...)
pattern to Perl's regular expression language. This works like an ordinary pair of capturing parens, but after a match you can use $&{name}
instead of $1
(or whichever $N
) to get at the captured substring.
The %{&}
hash is global, like all punctuation variables. Like $1
and friends, it's dynamically scoped and bound to the "last match".
This looks familiar
The syntax is borrowed from the .NET regex library.
Differences from .NET include the following:
Regexp::Fields ignores whitespace between the field name and the subpattern. To match leading whitespace, you'll need to use backslash or a character class.
/(?<space> [ ])/; # matches one space
The digit variables aren't reordered.
"12" =~ /(?<one>1)(2)/; # $2 is "2"
Regexp::Fields doesn't support named backreferences (which are on the TODO list) or field names in conditional tests (which aren't).
Lexical variables and the my
pragma
When a regex is compiled with use Regexp::Fields 'my'
in effect, a lexical variable for each field will be implicitly declared. After a successful match the variables will be set to the captured substrings, just like the corresponding values of %{&}
. After a failed match attempt they'll always be undef
.
This is not the case with %{&}
or the digit variables. After a failed match those, they might refer to a regex in some other part of your program. The lexical match variables work differently because they are bound once and forever to the regex where they were declared.
use Regexp::Fields qw(my);
my $f = qr/(?<foo> foo)/; # implicitly: my $foo
my $b = qr/(?<bar> bar)/; # implicitly: my $bar
if (/$f/ and /$b/) { # now $1 is "bar"
print "Matched $foo and $bar"; # but $foo and $bar are both set!
}
Which has some advantages, but comes with new drawbacks of its own.
First of all, Perl's lexical variables aren't visible until the statement after they're declared. This means you can't use the lexical "field" variables in (?{...})
or (??{...})
blocks, or on the replacement side of s///
.
Second, this wouldn't have done the Right Thing:
# [initialize $f and $b as above]
if (/$f|$b/) { # WRONG
print "Matched $f or $b";
}
When the two qr//
variables are interpolated like this a new regex is compiled at runtime. The lexicals are still bound to $f
and $b
, and not to this new regex that combines them.
And third, this won't do what you want either:
while (<>) {
for my $p (@lists) {
next unless /(?<pat> $p)/; # WRONG
print "Matched: $pat\n";
}
}
Here the regex is compiled at run-time because of the interpolated $p
variable and by then it's too late to declare the lexicals.
In all these cases you should use the dynamically-scoped %{&}
instead.
Functions
DIAGNOSTICS
- Sequence (?<name... not terminated
-
(F) You started a
(?<name> ...)
pattern but forgot the>
. - Illegal character in (?<name> ...)
-
(F) Field names must start with a letter, and can contain only letters, numbers and underscores.
- Field '%s' masks earlier declaration in same regex
-
(W) You used the same field name twice in a single regex. You can still access the first field with
$DIGIT
, but not with$&{name}
. - "%s" variable %s masks earlier declaration in same "%s"
-
(W) With the
my
directive in effect, each field implicitly declares a lexical variable. See perldiag for a full description of the warning. - Identifier too long
-
(F) You used a field name longer than Perl allows for a simple identifier. See perldiag.
- Sequence (?<%s...) not recognized
-
(F) You tried to compile a regex containing the
(?<name> ...)
extended pattern, butRegexp::Fields
wasn't installed at the time. You can reinstall it at runtime with the install() function. - corrupted regex program
-
(F) You compiled a regex with
Regexp::Fields
installed, but tried to execute it with the standard regex engine. You can reinstall it at runtime with the install() function. - Warning: Use of '%s' without parens is ambiguous
-
(W) Since '%' is the modulo operator as well as the hash sigil, the parser suggests that
keys %&
could mean keys-modulo-and rather than keys-HASH. Likewise witheach()
.You can hush the warning by adding parentheses (i.e.
keys(%&)
) or curly braces (keys %{&}
). See perldiag for a more complete description of this warning.
AUTHOR
Steve Grazzini (grazz@pobox.com)
BUGS
Mail them to the author.
Known deficiencies include:
The 'my' pragma doesn't work in 5.6.1.
You need to reinstall the modified regex engine every time you create a new thread.
There's a scoping problem when /g is used with /m or /s.
COPYRIGHT AND LICENSE
Copyright (c) 2003, Steve Grazzini. All rights reserved.
This module is free software; you can copy, modify and/or redistribute it under the same terms as Perl itself.