NAME
Data::Match - Complex data structure pattern matching
SYNOPSIS
use Data::Match qw(:all);
my ($match, $results) = match($structure, $pattern);
use Data::Match;
my $obj = new Data::Match;
my ($match, $results) = $obj->execute($structure, $pattern);
DESCRIPTION
Data::Match provides extensible complex Perl data structure searching and matching.
EXPORT
None are exported by default. :func
exports match
and matches
, :pat
exports all the pattern element generators below, :all
exports :func
and :pat
.
PATTERNS
A data pattern is a complex data structure that possibly matches another complex data structure. For example:
matches([ 1, 2 ], [ 1, 2 ]); # TRUE
matches([ 1, 2, 3 ], [ 1, ANY, 3 ]); # TRUE
matches([ 1, 2, 3 ], [ 1, ANY, 2 ]); # FALSE: 3 != 2
ANY
matches anything, including an undefined value.
my $results = matches([ 1, 2, 1 ], [ BIND('x'), ANY, BIND('x') ]); # TRUE
BIND($name)
matches anything and remembers each match and its position with every BIND($name)
in $result-
{'BIND'}{$name}>. If BIND($name)
is not the same as the first value bound to BIND($name)
it does not match. For example:
my $results = matches([ 1, 2, 3 ], [ BIND('x'), 2, BIND('x') ]); # FALSE: 3 != 1
COLLECT($name)
is similar to BIND but does not compare first bound values.
REST
matches all remaining elements of an array or hash.
matches([ 1, 2, 3 ], [ 1, REST() ]); # TRUE
matches({ 'a'=>1, 'b'=>1 }, { 'b'=>1, REST() => REST() }); # TRUE
FIND
searches at all depths for matching sub-patterns.
matches([ 1, [ 1, 2 ], 3], FIND(COLLECT('x', [ 1, REST() ])); # is true.
See the test script t/t1.t
in the package distribution for more pattern examples.
MATCH COLLECTIONS
When a BIND
or COLLECT
matches a datum, an entry is collected in $result->{BIND}
and $result->{COLLECT}
, respectively. (This might change in the future)
Each entry for the binding name is a hash containing 'v'
, 'p'
and 'ps'
lists.
'v'
-
is a list of the value at each match.
'p'
-
is a list of match paths describing where the corresponding match was found based on the root of the search at each match. See
match_path_*
.'p'
is not collected if$matchobj-
.gt
{'no_collect_path'} 'ps'
-
is a list of code strings (
match_path_str
) that describes where the match was for each match.'ps'
is collected only if$matchobj-
.gt
{'collect_path_str'}
SUB-PATTERNS
All patterns can have sub-patterns. Most patterns match the AND-ed results of their sub-patterns and their own behavior, first trying the sub-patterns before attempting to match the intrinsic behavior. However, OR
and ANY
match any sub-patterns;
For example:
match([ ['a', 1 ], ['b', 2], ['a', 3] ], EACH(COLLECT('x', ['a', ANY() ]))) # TRUE
The above pattern means:
For EACH element in the root structure (an array):
COLLECT each element, into collection named 'x'
, that is,
An ARRAY of length 2 that starts with 'a'
.
On the other hand.
match( [ ['a', 1 ], ['b', 2], ['a', 3] ], ALL(COLLECT('x', [ 'a', ANY() ])) )
# IS FALSE
Because the second root element (an array) does not start with 'a'
. But,
match( [ ['a', 1 ], ['a', 2], ['a', 3] ], ALL(COLLECT('x', [ 'a', ANY() ])) )
# IS TRUE
The pattern below flattens the nested array into atoms:
match(
[ 1, 'x',
[ 2, 'x',
[ 3, 'x'],
[ 4,
[ 5,
[ 'x' ]
],
6
]
]
],
FIND(COLLECT('x', EXPR(q{! ref}))),
{ 'no_collect_path' => 1 }
)->{'COLLECT'}{'x'}{'v'};
no_collect_path
causes COLLECT
and BIND
to not collect any paths.
MATCH SLICES
Match slices are objects that contain slices of matched portions of a data structure. This is useful for inflicting change into substructures matched by patterns like REST
.
For example:
do {
my $a = [ 1, 2, 3, 4 ];
my $p = [ 1, ANY, REST(BIND('s')) ];
my $r = matches($a, $p);
ok($r); # TRUE
ok(Compare($r->{'BIND'}{'s'}{'v'}[0], [ 3, 4 ])); # TRUE
$r->{'BIND'}{'s'}{'v'}[0][0] = 'x'; # Change match slice
matches($a, [ 1, 2, 'x', 4 ]); # TRUE
}
Hash match slices are generated for each key-value pair for a hash matched by EACH
and ALL
. Each of these match slices can be matched as a hash with a single key-value pair.
Match slices are useful for search and replace missions.
VISITATION ADAPTERS
By default Data::Match is blind to Perl object interfaces. To instruct Data::Match to not traverse object implementation containers and honor object interfaces you must provide a visitation adapter. A visitation adapter tells Data::Match how to traverse through an object interface and how to keep track of how it got through.
For example:
package Foo;
sub new
{
my ($cls, %opts) = @_;
bless \%opts, $cls;
}
sub x { shift->{x}; }
sub parent { shift->{parent}; }
sub children { shift->{children}; }
sub add_child {
my $self = shift;
for my $c ( @_ ) {
$c->{parent} = $self;
}
push(@{$self->{children}}, @_);
}
my $foos = [ map(new Foo('x' => $_), 1 .. 10) ];
for my $f ( @$foos ) { $f->add_child($foos->[rand($#$foo)); }
my $pat = FIND(COLLECT('Foo', ISA('Foo', { 'parent' => $foos->[0], REST() => REST() })));
$match->match($foos, $pat);
The problem with the above example is: FIND
will not honor the interface of class Foo by default and will eventually find a Foo where $_>parent eq $foos->[0]
through all the parent and child links in the objects' implementation container. To force Data::Match to honor an interface (or a subset of an interface) during FIND
traversal we create a 'find' adapter sub that will do the right thing.
my $opts = {
'find' => {
'Foo' => sub {
my ($self, $visitor, $match) = @_;
# Always do 'x'.
$visitor->($self->x, 'METHOD', 'x');
# Optional children traversal.
if ( $match->{'Foo_find_children'} ) {
$visitor->($self->children, 'METHOD', 'children');
}
# Optional parent traversal.
if ( $match->{'Foo_find_parent'} ) {
$visitor->($self->parent, 'METHOD', 'parent');
}
}
}
}
my $match = new Data::Match($opts, 'Foo_find_children' => 1);
$match = $match->execute($foos, $pat);
See t/t4.t
for more examples of visitation adapters.
DESIGN
Data::Match employs a mostly-functional external interface since this module was inspired by a Lisp tutorial ("The Little Lisper", maybe) I read too many years ago; besides, pattern matching is largely recursively functional. The optional control hashes and traverse adapter interfaces are better represented by an object interface so I implemented a functional veneer over the core object interface.
Internally, objects are used to represent the pattern primitives because most of the pattern primitives have common behavior. There are a few design patterns that are particularly applicable in Data::Match: Visitor and Adapter. Adapter is used to provide the extensibility for the traversal of blessed structures such that Data::Match can honor the external interfaces of a class and not blindly violate encapsulation. Visitor is the basis for some of the FIND
pattern implementation. The Data::Match::Slice
classes that provide the match slices are probably a Veneer on the array and hash types through the tie meta-behaviors.
CAVEATS
Does not have regexp-like operators like '?', '*', '+'.
Should probably have more interfaces with Data::DRef and Data::Walker.
The visitor adapters do not use
UNIVERSAL::isa
to search for the adapter; it usesref
. This will be fixed in a future release.Since hash keys do not retain blessedness (what was Larry thinking?) it is difficult to have patterns match keys without resorting to some bizarre regexp instead of using
isa
.match_path_set
andmatch_path_ref
do not work through'METHOD'
path boundaries. This will be fixed in a future release.BIND
andCOLLECT
need scoping operators for deeply collected patterns.
STATUS
If you find this to be useful please contact the author. This is alpha software; all APIs, semantics and behaviors are subject to change.
INTERFACE
This section describes the external interface of this module.
%match_opts
Default options for match
.
execute
Matches a structure against a pattern. In a list context, returns both the match success and results; in a scalar context returns the results hash if match succeeded or undef.
use Data::Match;
my $obj = new Data::Match();
my $matched = $obj->execute($thing, $pattern);
match
use Data::Match qw(match);
match($thing, $pattern, @opts)
is equivalent to:
use Data::Match;
Data::Match->new(@opts)->execute($thing, $pattern);
matches
Same as match
in scalar context.
match_path_str
Returns a perl expression that will generate code to point to the element of the path.
$matchobj->match_path_str($path, $str);
$str
defaults to '$_'
.
match_path_DRef_path
Returns a string suitable for Data::DRef.
$matchobj->match_path_DRef_path($path, $str, $sep);
$str
is used as a prefix for the Data::DRef path. $str
defaults to ''
; $sep
defaults to $Data::DRef::Separator
or '.'
;
match_path_get
Returns the value pointing to the location for the match path in the root.
$matchobj->match_path_get($path, $root);
$root
defaults to $matchobj-
;gt
{'root'}
Example:
my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
my $x = $results->match_path_get($thing, $results->{'BIND'}{'x'}{'p'}[0]);
The above example returns the first array that begins with 'x'
.
match_path_set
Returns the value pointing to the location for the match path in the root.
$matchobj->match_path_set($path, $value, $root);
$root
defaults to $matchobj-
;gt
{'root'}
Example:
my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
$results->match_path_set($thing, $results->{'BIND'}{'x'}{'p'}[0], 'y');
The above example replaces the first array found that starts with 'x' with 'y';
match_path_ref
Returns a scalar ref pointing to the location for the match path in the root.
$matchobj->match_path_ref($path, $root);
$root
defaults to $matchobj-
;gt
{'root'}
Example:
my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
my $ref = $results->match_path_ref($thing, $results->{'BIND'}{'x'}{'p'}[0]);
$$ref = 'y';
The above example replaces the first array that starts with 'x' with 'y';
VERSION
Version 0.05, $Revision: 1.12 $.
AUTHOR
Kurt A. Stephens <ks.perl@kurtstephens.com>
COPYRIGHT
Copyright (c) 2001, 2002 Kurt A. Stephens and ION, INC.
SEE ALSO
perl, Array::PatternMatcher, Data::Compare, Data::Dumper, Data::DRef, Data::Walker.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 79:
You forgot a '=back' before '=head1'
You forgot a '=back' before '=head1'