NAME

Regexp::Bind - Bind variables to captured buffers

SYNOPSIS

use Regexp::Bind qw(
                    bind global_bind
                    bind_array global_bind_array
                   );

$record = bind($string, $regexp, @fields);
@record = global_bind($string, $regexp, @fields);

$record = bind_array($string, $regexp);
@record = global_bind_array($string, $regexp);

$record = bind($string, $embedded_regexp);
@record = global_bind($string, $embedded_egexp);

DESCRIPTION

This module is an extension to perl's native regexp function. It binds anonymous hashes or named variables to matched buffers. Both normal regexp syntax and embedded regexp syntax are supported. You can view it as a tiny and petite data extraction system.

FUNCTIONS

Two types of function are exported. They bind the given fields to captured contents, and return anonymous hashes/arrayes of the fields.

Match the first occurrence

use Data::Dumper;

Binding to anonymous hash

$record = bind($string, $regexp, qw(field_1 field_2 field_3));
print Dumper $record;

Binding to array

$record = bind_array($string, $regexp);
print $record->[0];

Do global matching and store matched parts in @record

Binding to anonymous hash

@record = global_bind($string, $regexp, qw(field_1 field_2 field_3));
print Dumper $_ foreach @record;

Binding to array

@record = global_bind_array($string, $regexp);
print $record[0]->[0];

NAMED VARIABLE BINDING

To use named variable binding, please set $Regexp::Bind::USE_NAMED_VAR to non-undef, and then matched parts will be bound to named variables while using bind(). It is not supported for global_bind(), bind_array() and global_bind_array().

$Regexp::Bind::USE_NAMED_VAR = 1;
bind($string, $regexp, qw(field_1 field_2 field_3));
print "$field_1 $field_2 $field_3\n";

EMBEDDED REGEXP

Using embedded regexp syntax means you can embed fields right in regexp itself. Its embedded syntax exploits the feature of in-line commenting in regexps.

The module first tries to detect if embedded syntax is used. If detected, then comments are stripped and regexp is turned back into a simple one.

Using embedded syntax, for the sake of simplicity and legibility, field's name is restricted to alphanumerics only. bind_array() and global_bind_array() do not support embedded syntax.

Example:

bind($string, qr'# (?#<field_1>\w+) (?#<field_2>\d+)\n'm);

is converted into

bind($string, qr'# (\w+) (\d+)\n'm);

If embedded syntax is detected, further input arguments are ignored. It means that

bind($string, qr'# (?#<field_1>\w+) (?#<field_2>\d+)\n'm,
     qw(field_1 field_2));

is the same as

bind($string, qr'# (?#<field_1>\w+) (?#<field_2>\d+)\n'm);

and conceptually equal to

bind($string, qr'# (\w+) (\d+)\n'm, qw(field_1 field_2));

Note that the module simply replaces (?#<field name> with ( and binds the field's name to buffer. It does not check for syntax correctness, so any fancier usage may crash.

INLINE FILTERING

Inline filtering now works with embedded syntax. Matched parts are saved in $_, and you can do some simple transformation within the brackets before they are exported.

bind($string, qr'# (?#<field_1>{ s/\s+//, $_ }\w+) (?#<field_2>{ $_*= 10, $_ }\d+)\n'm);

SEE ALSO

For a similar functionality, see Regexp::Fields.

And see Template::Extract and WWW::Extractor also. They are similar projects with prettier templates instead of low-level regexps.

You may wanna check test.pl for an example too.

TO DO

Perhaps, I'll add a 'FOREACH' directive like that in Template::Extract.

COPYRIGHT

Copyright (C) 2004 by Yung-chung Lin (a.k.a. xern) <xern@cpan.org>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself