NAME
Data::Walk::Extracted - An extracted dataref walker
SYNOPSIS
This is a contrived example! For a more functional (complex/useful) example see the roles in this package.
package Data::Walk::MyRole;
use Moose::Role;
requires '_process_the_data';
use MooseX::Types::Moose qw(
Str
ArrayRef
HashRef
);
my $mangle_keys = {
Hello_ref => 'primary_ref',
World_ref => 'secondary_ref',
};
#########1 Public Method 3#########4#########5#########6#########7#########8
sub mangle_data{
my ( $self, $passed_ref ) = @_;
@$passed_ref{ 'before_method', 'after_method' } =
( '_mangle_data_before_method', '_mangle_data_after_method' );
### Start recursive parsing
$passed_ref = $self->_process_the_data( $passed_ref, $mangle_keys );
### End recursive parsing with: $passed_ref
return $passed_ref->{Hello_ref};
}
#########1 Private Methods 3#########4#########5#########6#########7#########8
### If you are at the string level merge the two references
sub _mangle_data_before_method{
my ( $self, $passed_ref ) = @_;
if(
is_Str( $passed_ref->{primary_ref} ) and
is_Str( $passed_ref->{secondary_ref} ) ){
$passed_ref->{primary_ref} .= " " . $passed_ref->{secondary_ref};
}
return $passed_ref;
}
### Strip the reference layers on the way out
sub _mangle_data_after_method{
my ( $self, $passed_ref ) = @_;
if( is_ArrayRef( $passed_ref->{primary_ref} ) ){
$passed_ref->{primary_ref} = $passed_ref->{primary_ref}->[0];
}elsif( is_HashRef( $passed_ref->{primary_ref} ) ){
$passed_ref->{primary_ref} = $passed_ref->{primary_ref}->{level};
}
return $passed_ref;
}
package main;
use MooseX::ShortCut::BuildInstance qw(
build_instance
);
my $AT_ST = build_instance(
package => 'Greeting',
superclasses => [ 'Data::Walk::Extracted' ],
roles => [ 'Data::Walk::MyRole' ],
);
print $AT_ST->mangle_data( {
Hello_ref =>{ level =>[ { level =>[ 'Hello' ] } ] },
World_ref =>{ level =>[ { level =>[ 'World' ] } ] },
} ) . "\n";
#################################################################################
# Output of SYNOPSIS
# 01:Hello World
#################################################################################
DESCRIPTION
This module takes a data reference (or two) and recursivly travels through it(them). Where the two references diverge the walker follows the primary data reference. At the beginning and end of each branch or node in the data the code will attempt to call a method on the remaining unparsed data.
Acknowledgement of MJD
This is an implementation of the concept of extracted data walking from Higher-Order-Perl Chapter 1 by Mark Jason Dominus. The book is well worth the money! With that said I diverged from MJD purity in two ways. This is object oriented code not functional code. Second, when taking action the code will search for class methods provided by (your) role rather than acting on passed closures. There is clearly some overhead associated with both of these differences. I made those choices consciously and if that upsets you do not hassle MJD!
What is the unique value of this module?
With the recursive part of data walking extracted the various functionalities desired when walking the data can be modularized without copying this code. The Moose framework also allows diverse and targeted data parsing without dragging along a kitchen sink API for every use of this class.
Extending Data::Walk::Extracted
All action taken during the data walking must be initiated by implementation of action methods that do not exist in this class. It usually also makes sense to build an initial action method as well. The initial action method can do any data-preprocessing that is useful as well as providing the necessary set up for the generic walker. All of these elements can be combined with this class using a Moose role , by extending the class, or it can be joined to the class at run time. See MooseX::ShortCut::BuildInstance . or Moose::Util for more class building information. See the parsing flow to understand the details of how the methods are used. See methods used to write roles for the available methods to implement the roles.
Then, Write some tests for your role!
Recursive Parsing Flow
Initial data input and scrubbing
The primary input method added to this class for external use is refered to as the 'action' method (ex. 'mangle_data'). This action method needs to receive data and organize it for sending to the start method for the generic data walker. Remember if more than one role is added to Data::Walk::Extracted for a given instance then all methods should be named with consideration for other (future?) method names. The '$conversion_ref' allows for muliple uses of the core data walkers generic functions. The $conversion_ref is not passed deeper into the recursion flow.
Assess and implement the before_method
The class next checks for an available 'before_method'. Using the test;
exists $passed_ref->{before_method};
If the test passes then the next sequence is run.
$method = $passed_ref->{before_method};
$passed_ref = $self->$method( $passed_ref );
If the $passed_ref is modified by the 'before_method' then the recursive parser will parse the new ref and not the old one. The before_method can set;
$passed_ref->{skip} = 'YES'
Then the flow checks for the need to investigate deeper.
Test for deeper investigation
The code now checks if deeper investigation is required checking both that the 'skip' key = 'YES' in the $passed_ref or if the node is a base ref type. If either case is true the process jumps to the after method otherwise it begins to investigate the next level.
Identify node elements
If the next level in is not skipped then a list is generated for all paths in the node. For example a 'HASH' node would generate a list of hash keys for that node. SCALAR nodes will generate a list with only one element containing the scalar contents. UNDEF nodes will generate an empty list.
Sort the node as required
If the list should be sorted then the list is sorted. ARRAYS are hard sorted. This means that the actual items in the (primary) passed data ref are permanantly sorted.
Process each element
For each identified element of the node a new $data_ref is generated containing data that represents just that sub element. The secondary_ref is only constructed if it has a matching type and element to the primary ref. Matching for hashrefs is done by key matching only. Matching for arrayrefs is done by position exists testing only. No position content compare is done! Scalars are matched on content. The list of items generated for this element is as follows;
before_method => -->name of before method for this role here<--
after_method => -->name of after method for this role here<--
primary_ref => the piece of the primary data ref below this element
primary_type => the lower primary (walker) ref type
match => YES|NO (This indicates if the secondary ref meets matching critera
skip => YES|NO Checks the three skip attributes against the lower primary_ref node. This can also be set in the 'before_method' upon arrival at that node.
secondary_ref => if match eq 'YES' then built like the primary ref
secondary_type => if match eq 'YES' then calculated like the primary type
branch_ref => stack trace
A position trace is generated
The current node list position is then documented and pushed onto the array at $passed_ref->{branch_ref}. The array reference stored in branch_ref can be thought of as the stack trace that documents the node elements directly between the current position and the initial (or zeroth) level of the parsed primary data_ref. Past completed branches and future pending branches are not maintained. Each element of the branch_ref contains four positions used to describe the node and selections used to traverse that node level. The values in each sub position are;
[
ref_type, #The node reference type
the list item value or '' for ARRAYs,
#key name for hashes, scalar value for scalars
element sequence position (from 0),
#For hashes this is only relevent if sort_HASH is called
level of the node (from 0),
`#The zeroth level is the initial data ref
]
Going deeper in the data
The down level ref is then passed as a new data set to be parsed and it starts at the before_method again.
Actions on return from recursion
When the values are returned from the recursion call the last branch_ref element is poped off and the returned data ref is used to replace the sub elements of the primary_ref and secondary_ref associated with that list element in the current level of the $passed_ref. If there are still pending items in the node element list then the program processes them too
Assess and implement the after_method
After the node elements have all been processed the class checks for an available 'after_method' using the test;
exists $passed_ref->{after_method};
If the test passes then the following sequence is run.
$method = $passed_ref->{after_method};
$passed_ref = $self->$method( $passed_ref );
If the $passed_ref is modified by the 'after_method' then the recursive parser will parse the new ref and not the old one.
Go up
The updated $passed_ref is passed back up to the next level .
Attributes
Data passed to ->new when creating an instance. For modification of these attributes see Public Methods. The ->new function will either accept fat comma lists or a complete hash ref that has the possible attributes as the top keys. Additionally some attributes that have the following prefixed methods; get_$name, set_$name, clear_$name, and has_$name can be passed to _process_the_data and will be adjusted for just the run of that method call. These are called one shot attributes. Nested calls to _process_the_data will be tracked and the attribute will remain in force until the parser returns to the calling 'one shot' level. Previous attribute values are restored after the 'one shot' attribute value expires.
sorted_nodes
Definition: If the primary_type of the $element_ref is a key in this attribute hash ref then the node list is sorted. If the value of that key is a CODEREF then the sort sort function will called as follows.
@node_list = sort $coderef @node_list
For the type 'ARRAY' the node is sorted (permanantly) by the element values. This means that if the array contains a list of references it will effectivly sort against the ASCII of the memory pointers. Additionally the 'secondary_ref' node is not sorted, so prior alignment may break. In general ARRAY sorts are not recommended.
Default {} #Nothing is sorted
Range This accepts a HashRef.
Example:
sorted_nodes =>{
ARRAY => 1,#Will sort the primary_ref only
HASH => sub{ $b cmp $a }, #reverse sort the keys
}
skipped_nodes
Definition: If the primary_type of the $element_ref is a key in this attribute hash ref then the 'before_method' and 'after_method' are run at that node but no parsing is done.
Default {} #Nothing is skipped
Range This accepts a HashRef.
Example:
sorted_nodes =>{
OBJECT => 1,#skips all object nodes
}
skip_level
Definition: This attribute is set to skip (or not) node parsing at the set level. Because the process doesn't start checking until after it enters the data ref it effectivly ignores a skip_level set to 0 (The base node level). The test checks against the value in last position of the prior trace array ref + 1.
Default undef = Nothing is skipped
Range This accepts an integer
skip_node_tests
Definition: This attribute contains a list of test conditions used to skip certain targeted nodes. The test can target an array position, match a hash key, even restrict the test to only one level. The test is run against the latest branch_ref element so it skips the node below the matching conditions not the node at the matching conditions. Matching is done with '=~' and so will accept a regex or a string. The attribute contains an ArrayRef of ArrayRefs. Each sub_ref contains the following;
$type - This is any of the identified reference node types
$key - This is either a scalar or regex to use for matching a hash key
$position - This is used to match an array position. It can be an integer or 'ANY'
$level - This restricts the skipping test usage to a specific level only or 'ANY'
Example:
[
[ 'HASH', 'KeyWord', 'ANY', 'ANY'],
# Skip the node below the value of any hash key eq 'Keyword'
[ 'ARRAY', 'ANY', '3', '4'], ],
# Skip the node stored in arrays at position three on level four
]
Range An infinite number of skip tests added to an array
Default [] = no nodes are skipped
change_array_size
Definition: This attribute will not be used by this class directly. However the Data::Walk::Prune role may share it with other roles in the future so it is placed here so there will be no conflicts. This is usually used to define whether an array size shinks when an element is removed.
Default 1 (This probably means that the array will shrink when a position is removed)
Range Boolean values.
fixed_primary
Definition: This means that no changes made at lower levels will be passed upwards into the final ref.
Default 0 = The primary ref is not fixed (and can be changed) 0 -> effectively deep clones the portions of the primary ref that are traversed.
Range Boolean values.
Methods
Methods used to write roles
These are methods that are not meant to be exposed to the final user of a composed role and class but are used by the role to excersize the class.
_process_the_data( $passed_ref, $conversion_ref )
Definition: This method is the gate keeper to the recursive parsing of Data::Walk::Extracted. This method ensures that the minimum requirements for the recursive data parser are met. If needed it will use a conversion ref (also provided by the caller) to change input hash keys to the generic hash keys used by this class. This function then calls the actual recursive function. For an overview of the recursive steps see the flow outline.
Accepts: ( $passed_ref, $conversion_ref )
$passed_ref this ref contains key value pairs as follows;
primary_ref - a dataref that the walker will walk - required
review the $conversion_ref functionality in this function for renaming of this key.
secondary_ref - a dataref that is used for comparision while walking. - optional
review the $conversion_ref functionality in this function for renaming of this key.
before_method - a method name that will perform some action at the beginning of each node - optional
after_method - a method name that will perform some action at the end of each node - optional
[attribute name] - supported attribute names are accepted with temporary attribute settings here. These settings are temporarily set for a single "_process_the_data" call and then the original attribute values are restored.
$conversion_ref This allows a public method to accept different key names for the various keys listed above and then convert them later to the generic terms used by this class. - optional
Example
$passed_ref ={
print_ref =>{
First_key => [
'first_value',
'second_value'
],
},
match_ref =>{
First_key => 'second_value',
},
before_method => '_print_before_method',
after_method => '_print_after_method',
sorted_nodes =>{ Array => 1 },#One shot attribute setter
}
$conversion_ref ={
primary_ref => 'print_ref',# generic_name => role_name,
secondary_ref => 'match_ref',
}
Returns: the $passed_ref (only) with the key names restored to the ones passed to this method using the $conversion_ref.
_build_branch( $seed_ref, @arg_list )
Definition: There are times when a role will wish to reconstruct the data branch that lead from the 'zeroth' node to where the data walker is currently at. This private method takes a seed reference and uses data found in the branch ref to recursivly append to the front of the seed until a complete branch to the zeroth node is generated. The branch_ref list must be explicitly passed.
Accepts: a list of arguments starting with the $seed_ref to build from. The remaining arguments are just the array elements of the 'branch ref'.
Example:
$ref = $self->_build_branch(
$seed_ref,
@{ $passed_ref->{branch_ref}},
);
Returns: a data reference with the current path back to the start pre-pended to the $seed_ref
_extracted_ref_type( $test_ref )
Definition: In order to manage data types necessary for this class a data walker compliant 'Type' tester is provided. This is necessary to support a few non perl-standard types not generated in standard perl typing systems. First, 'undef' is the UNDEF type. Second, strings and numbers both return as 'SCALAR' (not '' or undef). Much of the code in this package runs on dispatch tables that are built around these specific type definitions.
Accepts: It receives a $test_ref that can be undef.
Returns: a data walker type or it confesses.
_get_had_secondary
Definition: during the initial processing of data in _process_the_data the existence of a passed secondary ref is tested and stored in the attribute '_had_secondary'. On occasion a role might need to know if a secondary ref existed at any level if it it is not represented at the current level.
Accepts: nothing
Returns: True|1 if the secondary ref ever existed
_get_current_level
Definition: on occasion you may need for one of the methods to know what level is currently being parsed. This will provide that information in integer format.
Accepts: nothing
Returns: the integer value for the level
Public Methods
add_sorted_nodes( NODETYPE => 1, )
Definition: This method is used to add nodes to be sorted to the walker by adjusting the attribute sorted_nodes.
Accepts: Node key => value pairs where the key is the Node name and the value is 1. This method can accept multiple key => value pairs.
Returns: nothing
has_sorted_nodes
Definition: This method checks if any sorting is turned on in the attribute sorted_nodes.
Accepts: Nothing
Returns: the count of sorted node types listed
check_sorted_nodes( NODETYPE )
Definition: This method is used to see if a node type is sorted by testing the attribute sorted_nodes.
Accepts: the name of one node type
Returns: true if that node is sorted as determined by sorted_nodes
clear_sorted_nodes
Definition: This method will clear all values in the attribute sorted_nodes. and therefore turn off all cleared sorts.
Accepts: nothing
Returns: nothing
remove_sorted_node( NODETYPE1, NODETYPE2, )
Definition: This method will clear the key / value pairs in sorted_nodes for the listed items.
Accepts: a list of NODETYPES to delete
Returns: In list context it returns a list of values in the hash for the deleted keys. In scalar context it returns the value for the last key specified
set_sorted_nodes( $hashref )
Definition: This method will completely reset the attribute sorted_nodes to $hashref.
Accepts: a hashref of NODETYPE keys with the value of 1.
Returns: nothing
get_sorted_nodes
Definition: This method will return a hashref of the attribute sorted_nodes
Accepts: nothing
Returns: a hashref
add_skipped_nodes( NODETYPE1 => 1, NODETYPE2 => 1 )
Definition: This method adds additional skip definition(s) to the skipped_nodes attribute.
Accepts: a list of key value pairs as used in 'skipped_nodes'
Returns: nothing
has_skipped_nodes
Definition: This method checks if any nodes are set to be skipped in the attribute skipped_nodes.
Accepts: Nothing
Returns: the count of skipped node types listed
check_skipped_node( $string )
Definition: This method checks if a specific node type is set to be skipped in the skipped_nodes attribute.
Accepts: a string
Returns: Boolean value indicating if the specific $string is set
remove_skipped_nodes( NODETYPE1, NODETYPE2 )
Definition: This method deletes specificily identified node skips from the skipped_nodes attribute.
Accepts: a list of NODETYPES to delete
Returns: In list context it returns a list of values in the hash for the deleted keys. In scalar context it returns the value for the last key specified
clear_skipped_nodes
Definition: This method clears all data in the skipped_nodes attribute.
Accepts: nothing
Returns: nothing
set_skipped_nodes( $hashref )
Definition: This method will completely reset the attribute skipped_nodes to $hashref.
Accepts: a hashref of NODETYPE keys with the value of 1.
Returns: nothing
get_skipped_nodes
Definition: This method will return a hashref of the attribute skipped_nodes
Accepts: nothing
Returns: a hashref
set_skip_level( $int )
Definition: This method is used to reset the skip_level attribute after the instance is created.
Accepts: an integer (negative numbers and 0 will be ignored)
Returns: nothing
get_skip_level()
Definition: This method returns the current skip_level attribute.
Accepts: nothing
Returns: an integer
has_skip_level()
Definition: This method is used to test if the skip_level attribute is set.
Accepts: nothing
Returns: $Bool value indicating if the 'skip_level' attribute has been set
clear_skip_level()
Definition: This method clears the skip_level attribute.
Accepts: nothing
Returns: nothing (always successful)
set_skip_node_tests( ArrayRef[ArrayRef] )
Definition: This method is used to change (completly) the 'skip_node_tests' attribute after the instance is created. See skip_node_tests for an example.
Accepts: an array ref of array refs
Returns: nothing
get_skip_node_tests()
Definition: This method returns the current master list from the skip_node_tests attribute.
Accepts: nothing
Returns: an array ref of array refs
has_skip_node_tests()
Definition: This method is used to test if the skip_node_tests attribute is set.
Accepts: nothing
Returns: The number of sub array refs there are in the list
clear_skip_node_tests()
Definition: This method clears the skip_node_tests attribute.
Accepts: nothing
Returns: nothing (always successful)
add_skip_node_tests( ArrayRef1, ArrayRef2 )
Definition: This method adds additional skip_node_test definition(s) to the the skip_node_tests attribute list.
Accepts: a list of array refs as used in 'skip_node_tests'. These are 'pushed onto the existing list.
Returns: nothing
set_change_array_size( $bool )
Definition: This method is used to (re)set the change_array_size attribute after the instance is created.
Accepts: a Boolean value
Returns: nothing
get_change_array_size()
Definition: This method returns the current state of the change_array_size attribute.
Accepts: nothing
Returns: $Bool value representing the state of the 'change_array_size' attribute
has_change_array_size()
Definition: This method is used to test if the change_array_size attribute is set.
Accepts: nothing
Returns: $Bool value indicating if the 'change_array_size' attribute has been set
clear_change_array_size()
Definition: This method clears the change_array_size attribute.
Accepts: nothing
Returns: nothing
set_fixed_primary( $bool )
Definition: This method is used to change the fixed_primary attribute after the instance is created.
Accepts: a Boolean value
Returns: nothing
get_fixed_primary()
Definition: This method returns the current state of the fixed_primary attribute.
Accepts: nothing
Returns: $Bool value representing the state of the 'fixed_primary' attribute
has_fixed_primary()
Definition: This method is used to test if the fixed_primary attribute is set.
Accepts: nothing
Returns: $Bool value indicating if the 'fixed_primary' attribute has been set
clear_fixed_primary()
Definition: This method clears the fixed_primary attribute.
Accepts: nothing
Returns: nothing
Definitions
node
Each branch point of a data reference is considered a node. The possible paths deeper into the data structure from the node are followed 'vertically first' in recursive parsing. The original top level reference is considered the 'zeroth' node.
base node type
Recursion 'base' node types are considered to not have any possible deeper branches. Currently that list is SCALAR and UNDEF.
Supported node walking types
- ARRAY
- HASH
- SCALAR
- UNDEF
-
Other node support
Support for Objects is partially implemented and as a consequence '_process_the_data' won't immediatly die when asked to parse an object. It will still die but on a dispatch table call that indicates where there is missing object support, not at the top of the node. This allows for some of the skip attributes to use 'OBJECT' in their definitions.
Supported one shot attributes
- sorted_nodes
- skipped_nodes
- skip_level
- skip_node_tests
- change_array_size
- fixed_primary
Dispatch Tables
This class uses the role Data::Walk::Extracted::Dispatch to implement dispatch tables. When there is a decision point, that role is used to make the class extensible.
Caveat utilitor
This is not an extention of Data::Walk
The core class has no external effect. All output comes from additions to the class.
This module uses the 'defined or' ( //= ) and so requires perl 5.010 or higher.
This is a Moose based data handling class. Many coders will tell you Moose and data manipulation don't belong together. They are most certainly right in speed intensive circumstances.
Recursive parsing is not a good fit for all data since very deep data structures will fill up a fair amount of memory! Meaning that as the module recursively parses through the levels it leaves behind snapshots of the previous level that allow it to keep track of it's location.
The passed data references are effectivly deep cloned during this process. To leave the primary_ref pointer intact see fixed_primary
GLOBAL VARIABLES
$ENV{Smart_Comments}
The module uses Smart::Comments if the '-ENV' option is set. The 'use' is encapsulated in an if block triggered by an environmental variable to comfort non-believers. Setting the variable $ENV{Smart_Comments} in a BEGIN block will load and turn on smart comment reporting. There are three levels of 'Smartness' available in this module '###', '####', and '#####'.
Build/Install from Source
1. Download a compressed file with the code
2. Extract the code from the compressed file. If you are using tar this should work:
tar -zxvf Data-Walk-Extracted-v0.xx.xx.tar.gz
3. Change (cd) into the extracted directory
4. Run the following commands
(For Windows find what version of make was used to compile your perl)
perl -V:make
(then for Windows substitute the correct make function (ex. s/make/dmake/g))
>perl Makefile.PL
>make
>make test
>make install # As sudo/root
>make clean
SUPPORT
TODO
1. provide full recursion through Objects
2. Support recursion through CodeRefs (Closures)
3. Add a Data::Walk::Diff Role to the package
4. Add a Data::Walk::Top Role to the package
5. Add a Data::Walk::Thin Role to the package
6. Add a Data::Walk::Substitute Role to the package
7. Add Log::Shiras debugging in exchange for Smart::Comments
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2013 by Jed Lund.
Dependencies
5.010 (for use of defined or //)
SEE ALSO
Smart::Comments - is used if the -ENV option is set
Data::Dumper - Dumper
YAML - Dump
Data::Walk::Print - available Data::Walk::Extracted Role
Data::Walk::Prune - available Data::Walk::Extracted Role
Data::Walk::Graft - available Data::Walk::Extracted Role
Data::Walk::Clone - available Data::Walk::Extracted Role