NAME

Text::Parser::RuleSpec - Syntax sugar for rule specification while subclassing Text::Parser or derivatives

VERSION

version 1.000

SYNOPSIS

package MyFavorite::Parser;

use Text::Parser::RuleSpec;
extends 'Text::Parser';

has '+multiline_type'  => (default => 'join_next');

unwraps_lines_using (
    is_wrapped     => sub {
        my $self = shift;
        $_ = shift;
        chomp;
        m/\s+[~]\s*$/;
    }, 
    unwrap_routine => sub {
        my ($self, $last, $current) = @_;
        chomp $last;
        $last =~ s/\s+[~]\s*$//g;
        "$last $current";
    }, 
);

applies_rule get_emails => (
    if => '$1 eq "EMAIL:"', 
    do => '$2;'
);

package main;

my $parser = MyFavorite::Parser->new();
$parser->read('/path/to/email_lists.txt');
my (@emails) = $parser->get_records();
print "Here are all the emails from the file: @emails\n";

DESCRIPTION

Primary usage

This class enables users to create their own parser classes for a known text file format, and facilitates code-sharing across multiple variants of the same basic text format. The basic steps are as follows:

package MyFavorite::Parser;
use Text::Parser::RuleSpec;
extends 'Text::Parser';

That's it! This is the bare-minimum required to make your own text parser. But it is not particularly useful at this point without any rules of its own.

applies_rule comment_char => (
    if          => '$1 =~ /^#/;', 
    dont_record => 1, 
);

This above rule ignores all comment lines and is added to MyFavorite::Parser class. So now when you create an instance of MyFavorite::Parser, it would automatically run this rule when you call read.

We can preset any attributes for this parser class using the familiar Moose functions. Here is an example:

has '+line_wrap_style' => (
    default => 'trailing_backslash', 
    is      => 'ro', 
);

has '+auto_trim' => (
    default => 'b', 
    is      => 'ro', 
);

Using attributes for storage

Sometimes, you may want to store the parsed information in attributes, instead of records. So for example:

has current_section => (
    is      => 'rw', 
    isa     => 'Str|Undef', 
    default => undef, 
    lazy    => 1, 
);

has _num_lines_by_section => (
    is      => 'rw', 
    isa     => 'HashRef[Int]', 
    default => sub { {}; }, 
    lazy    => 1, 
    handles => {
        num_lines      => 'get', 
        _set_num_lines => 'set', 
    }
);

applies_rule inc_section_num_lines => (
    if          => '$1 ne "SECTION"', 
    do          => 'my $sec = $this->current_section;
                    my $n = $this->num_lines($sec); 
                    $this->_set_num_lines($sec => $n+1);', 
    dont_record => 1, 
);

applies_rule get_section_name => (
    if          => '$1 eq "SECTION"', 
    do          => '$this->current_section($2); $this->_set_num_lines($2 => 0);', 
    dont_record => 1, 
);

In the above example, you can see how the section name we get from one rule is used in a different rule.

Inheriting rules in subclasses

We can further subclass a class that extends Text::Parser. Inheriting the rules of the superclass is automatic:

package MyParser1;
use Text::Parser::RuleSpec;

extends 'Text::Parser';

applies_rule rule1 => (
    do => '# something', 
);

package MyParser2;
use Text::Parser::RuleSpec;

extends 'MyParser1';

applies_rule rule1 => (
    do => '# something else', 
);

Now, MyParser2 contains two rules: MyParser1/rule1 and MyParser2/rule1. Note that both the rules in both classes are called rule1 and both will be executed. By default, rules of superclasses will be run before rules in the subclass. The subclass can change this order by explicitly stating that its own rule1 is run before the rule1 of MyParser1:

package MyParser2;
use Text::Parser::RuleSpec;

extends 'MyParser1';

applies_rule rule1 => (
    do     => '# something else', 
    before => 'MyParser1/rule1', 
);

A subclass may choose to disable any superclass rules:

package MyParser3;
use Text::Parser::RuleSpec;

extends 'MyParser2';

disables_superclass_rules qr/^MyParser1/;  # disables all rules from MyParser1 class

Or to clone a rule from either the same class, a superclass, or even from some other random class.

package ClonerParser;
use Text::Parser::RuleSpec;

use Some::Parser;  # contains rules: "heading", "section"
extends 'MyParser2';

applies_rule my_own_rule => (
    if    => '# check something', 
    do    => '# collect some data', 
    after => 'MyParser2/rule1', 
);

applies_cloned_rule 'MyParser2/rule1' => (
    add_precondition => '# Additional condition', 
    do               => '# Optionally change the action', 
    # prepend_action => '# Or just prepend something', 
    # append_action  => '# Or append something', 
    after            => 'MyParser1/rule1', 
);

Imagine this situation: Programmer A writes a text parser for a text format syntax SYNT1, and programmer B notices that the text format he wishes to parse (SYNT2) is similar, except for a few differences. Instead of having to re-write the code from scratch, he can reuse the code from programmer A and modify it exactly as needed. This is especially useful when syntaxes many different text formats are very similar.

METHODS

There is no constructor for this module. You cannot create an instance of Text::Parser::RuleSpec. Therefore, all methods here can be called on the Text::Parser::RuleSpec directly.

class_has_rules

Takes parser class name and returns a boolean representing if that class has any rules or not. Returns boolean true if the class has any rules, and a boolean false otherwise.

print "There are no class rules for MyFavorite::Parser.\n"
    if not Text::Parser::RuleSpec->class_has_rules('MyFavorite::Parser');

class_rule_order

Takes a single string argument and returns the ordered list of rule names for the class.

my (@order) = Text::Parser::RuleSpec->class_rule_order('MyFavorite::Parser');

class_rule_object

This takes a single string argument with the fully qualified rule name, and returns the actual rule object identified by that name.

my $rule = Text::Parser::RuleSpec->class_rule_object('MyFavorite::Parser/rule1');

class_rules

Takes a single string argument and returns the actual rule objects of the given class name. This is a shortcut to first running class_rule_order and then running class_rule_object on each one of them.

my (@rules) = Text::Parser::RuleSpec->class_rules('MyFavorite::Parser');

is_known_rule

Takes a string argument expected to be fully-qualified name of a rule. Returns a boolean that indicates if such a rule was ever compiled. The fully-qualified name of a rule is of the form Some::Class/rule_name. Any suffixes like @2 or @3 should be included to check the existence of any cloned rules.

print "Some::Parser::Class/some_rule is a rule\n"
    if Text::Parser::RuleSpec->is_known_rule('Some::Parser::Class/some_rule');

populate_class_rules

Takes a parser class name as string argument. It populates the class rules according to the latest order of rules.

Text::Parser::RuleSpec->populate_class_rules('MyFavorite::Parser');

FUNCTIONS

The following methods are exported into the namespace of your class by default, and may only be called outside the main namespace.

applies_rule

Takes one mandatory string argument - a rule name - followed by the options to create a rule. These are the same as the arguments to the add_rule method of Text::Parser class. Returns nothing. Exceptions will be thrown if any of the required arguments are not provided.

applies_rule print_emails => (
    if               => '$1 eq "EMAIL:"', 
    do               => 'print $2;', 
    dont_record      => 1, 
    continue_to_next => 1, 
);

The above call to create a rule print_emails in your class MyFavorite::Parser, will save the rule as MyFavorite::Parser/print_emails. So if you want to clone it in sub-classes or want to insert a rule before or after that in a sub-class, then this is the way to reference the rule.

Optionally, one may provide one of before or after clauses to specify when this rule is to be executed.

applies_rule check_line_syntax => (
    if     => '$1 ne "SECTION"', 
    do     => '$this->check_syntax($this->current_section, $_);', 
    before => 'Parent::Parser/add_line_to_data_struct', 
);

The above rule will apply

Exceptions will be thrown if the before or after rule does not have a class name in it, or if it is the same as the current class, or if the rule is not among the inherited rules so far. Only one of before or after clauses may be provided.

applies_cloned_rule

Clones an existing rule to make a replica, but you can add options to change any parameters of the rule.

applies_cloned_rule 'Some::SuperClass::Parser/some_rule' => (
    add_precondition => '1; # add some tests returning boolean', 
    before           => 'MayBe::Another::Superclass::Parser/some_other_rule',
        ## Or even 'Some::SuperClass::Parser/another_rule'
    do               => '## Change the do clause of original rule', 
);

The first argument must be a string containing the rule name to be cloned. You may clone a superclass rule, or even a rule from another class that you have only used in your code, but are not actually inheriting (using extends). You may even clone a rule from the present class if the rule has been defined already. If the rule name specified contains a class name, then the exact rule is cloned, modified according to other clauses, and inserted into the rule order. But if the rule name specified does not have a classname, then the function looks for a rule with that name in the current class, and clones that one.

You may use one of the before or after clauses just like in applies_rule. You may use any of the other rule creation options like if, do, continue_to_next, or dont_record. And you may optionally also use the add_precondition clause. In many cases, you may not need any of the rule-creation options at all and may use only add_precondition or any one of before or after clauses. If you do use any of the rule-creating options like do or if, then it will change those fields of the cloned copy of the original rule.

Note that when you clone a rule, you do not change the original rule itself. You actually make a second copy and modify that. So you retain the original rule along with the clone.

The new cloned rule created is automatically renamed by applies_cloned_rule. If a rule Some::Other::Class/my_rule_1 is cloned into your parser class MyFavorite::Parser, then the clone is named MyFavorite::Parser/my_rule_1. This way, the original rule is left unaffected. If such a name already exists, then the clone adds @2 suffix to the name, viz., MyFavorite::Parser/my_rule_1@2. If that also exists, it will be called MyFavorite::Parser/my_rule_1@3. And so on it goes on incrementing.

disables_superclass_rules

Takes a list of rule names, or regular expression patterns, or subroutine references to identify rules that are to be disabled. You cannot disable rules of the same class.

A string argument is expected to contain the full rule-name (including class name) in the format My::Parser::Class/my_rule. The / (slash) separating the class name and rule name is mandatory.

A regexp argument is tested against the full rule-name.

If a subroutine reference is provided, the subroutine is called for each rule in the class, and the rule is disabled if the subroutine returns a true value.

disables_superclass_rules qw(Parent::Parser::Class/parent_rule Another::Class/another_rule);
disables_superclass_rules qr/Parent::Parser::Class\/comm.*/;
disables_superclass_rules sub {
    my $rulename = shift;
    $rulename =~ /[@]/;
};

unwraps_lines_using

This function may be used if one wants to specify a custom line-unwrapping routine. Takes a hash argument with mandatory keys as follows:

unwraps_lines_using(
    is_wrapped     => sub { # Should return a boolean for each $line
        1;
    }, 
    unwrap_routine => sub { # Should return a string for each $last and $line
        my ($self, $last, $line) = @_;
        $last.$line;
    }, 
);

For the pair of routines to not cause unexpected undef results, they should return defined values always. To effectively unwrap lines, the is_wrapped routine should return a boolean 1 when it encounters the continuation character, and unwrap_routine should return a string that appropriately joins the last and current line together.

SEE ALSO

BUGS

Please report any bugs or feature requests on the bugtracker website http://github.com/balajirama/Text-Parser/issues

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR

Balaji Ramasubramanian <balajiram@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018-2019 by Balaji Ramasubramanian.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.