NAME

ControlBreak - Compare values during iteration to detect changes

SYNOPSIS

use v5.18;

use ControlBreak;

# set up two levels, in minor to major order
my $cb = ControlBreak->new( qw( District Country ) );

my $country_total = 0;
my $district_total = 0;

while (my $line = <DATA>) {
    chomp $line;

    my ($country, $district, $city, $population) = split ',', $line;

    # test the values (minor to major order)
    $cb->test($district, $country);

    # break on District (or Country) detected
    if ($cb->break('District')) {
        printf "%s,%s,%d%s\n", $cb->last('Country'), $cb->last('District'), $district_total, '*';
        $district_total = 0;
    }

    # break on Country detected
    if ($cb->break('Country')) {
        printf "%s total,%s,%d%s\n", $cb->last('Country'), '', $country_total, '**';
        $country_total = 0;
    }

    $country_total  += $population;
    $district_total += $population;
}
continue {
    # save the current values (as received by ->test) as the new
    # 'last' values on the next iteration.
    $cb->continue();
}

# simulate break at end of data, if we iterated at least once
if ($cb->iteration > 0) {
    printf "%s,%s,%d%s\n", $cb->last('Country'), $cb->last('District'), $district_total, '*';
    printf "%s total,%s,%d%s\n", $cb->last('Country'), '', $country_total, '**';
}

__DATA__
Canada,Alberta,Calgary,1019942
Canada,Ontario,Ottawa,812129
Canada,Ontario,Toronto,2600000
Canada,Quebec,Montreal,1704694
Canada,Quebec,Quebec City,531902
Canada,Quebec,Sherbrooke,161323
USA,Arizona,Phoenix,1640641
USA,California,Los Angeles,3919973
USA,California,San Jose,1026700
USA,Illinois,Chicago,2756546
USA,New York,New York City,8930002
USA,New York,Buffalo,281757
USA,Pennsylvania,Philadelphia,1619355
USA,Texas,Houston,2345606

DESCRIPTION

The ControlBreak module provides a class that is used to detect control breaks; i.e. when a value changes.

Typically, the data being retrieved or iterated over is ordered and there may be more than one value that is of interest. For example consider a table of population data with columns for country, district and city, sorted by country and district. With this module you can create an object that will detect changes in the district or country, considered level 1 and level 2 respectively. The calling program can take action, such as printing subtotals, whenever level changes are detected.

Ordered data is not a requirement. An example using unordered data would be counting consecutive numbers within a data stream; e.g. 0 0 1 1 1 1 0 1 1. Using ControlBreak you can detect each change and count the consecutive values, yielding two zeros, four 1's, one zero, and two 1's.

Note that ControlBreak cannot detect the end of your data stream. The test() method is normally called within a loop to detect changes in control variables, but once the last iteration is processed there are no further calls to test() as the loop ends. It may be necessary, therefore, to do additional processing after the loop in order to handle the very last data group; e.g. to print a final set of subtotals.

To simplify this situation, method test_and_do() can be used in place of test() and continue().

FIELDS

iteration

A readonly field that provides the current iteration number.

This can be useful if you are doing an final processing after an iteration loop has ended. In the event that the data stream is empty and there were no iterations, then you can condition your final processing on iteration > 0.

Note that the interation field is incremented by test() (or test_and_do()). Therefore, when called within a loop it is effectively zero-based if referenced within the iteration block before test() is invoked, and then one-based after test().

level_names

A readonly field that provides a list of the level names that were provided as arguments to new().

METHODS

new ( $level_name> [, $level_name> ]... )

Create a new ControlBreak object.

Arguments are user-defined names for each level, in minor to major order. The set of names must be unique, and they must each start with a letter or underscore, followed by any number of letters, numbers or underscores.

A level name can also begin with a '+', which denotes that a numeric comparison will be used for the values processed at this level.

The number of arguments to new() determines the number of control levels that will be monitored. The variables provided to method test() must match in number and datatype to these operators.

The order of the arguments corresponds to a hierarchical level of control, from lowest to highest; i.e. the first argument corresponds to level 1, the second to level 2, etc. This also corresponds to sort order, from minor to major, when iterating through a data stream.

break ( [ $level_name ] )

The break() method provides a convenient way to check whether the last invocation of the test method resulted in a control break, or a control break greater than or equal to the <level_name> optionally provided as an argument.

For example, if you have levels 'City', 'State' and 'Country', and there's a control break on level 1 (City), then invoking break() will return 1 and therefore be treated as true within a condition. If there was no control break, then 0 (false) is returned.

When invoked with a level name argument, break() will map the level name to a level number and compare it to the control break level determined by the last invocation of test(). If the tested control break level number is equal or higher than the argument level, then that level number is returned and, since it will be non-zero, treated as a true value within a condition. Otherwise, zero (false) is returned.

Ultimately the point of this is that you can use it to write a series of actions, like printing subtotals and clearing subtotal variables, such that a higher level control break will trigger actions associated with lower level control breaks. For example:

my $cb = ControlBreak( qw/City State Country/ );

if ( $cb->break() ) {
    say '=== control break detected at level: ' . $cb->levelname;
}
if ( $cb->break('City') ) {
    say "City total: $city";
    $city = 0;
}
if ( $cb->break('State') ) {
    say "State total: $state";
    $state = 0;
}
if ( $cb->break('Country') ) {
    say "Country total: $country";
    $country = 0;
}

In this example, when a Country control break is detected all three subtotals will be printed. When a State control break is detected, only State and City will print.

comparison ( level_name => [ 'eq' | '==' | sub ] ... )

The comparison() method accepts a hash which sets the comparison operations for the designated levels. Keywords must match the level names provide in new(). Values can be '==' for numeric comparison, 'eq' for alpha comparison, or anonymous subroutines.

Anonymous subroutines must take two arguments, compare them in some fashion, and return a boolean. The first argument to the comparison routine will be the value passed to the test() method. The second argument will be the corresponding value from the last iteration.

All levels are provided with default comparison functions as determined by new(). This method is provided so you can change one or more of those defaults. Any level name not referenced by keys in the argument list will be left unchanged.

Some handy comparison functions are:

# case-insensitive match
sub { lc $_[0] eq lc $_[1] }

# strings coerced to numbers (so 07 and 7 are equal)
sub { ($_[0] + 0) == ($_[1] + 0) }

# blank values treated as matched
sub { $_[0] eq '' ? 1 : $_[0] eq $_[1] }

continue

Saves the values most recently provided to the test() method so they can be compared to new values on the next iteration.

On the next iteration these values will be accessible via the last() method.

continue() is best invoked within the continue block of a loop, to make sure it isn't missed.

continue() cannot be used in conjunction with test_and_do(), which internally calls test and continue() for you.

last ( $level_name_or_number> )

For the corresponding level, returns the value that was given to the test() method called prior to the most recent one.

The argument can be a level name or a level number.

Normally this is used while iterating through a data stream. When a level change (i.e. control break) is detected, the current data value has changed relative to the preceding iteration. At this point it may be necessary to take some action, such a printing a subtotal. But, the subtotal will be for the preceding group of data and the current value belongs to the next group. The last() method allows you to access the value for the group that was just processed so, for example, the group name can be included on the subtotal line.

For example, if control levels were named 'X' and 'Y' and you are iterating through data and invoking test($x, $y) at each iteration, then invoking $cb->last('Y') on iteration 9 will returns the value of $y on iteration 8.

Note that continue() should not be invoked before last() within the scope of an iteration loop; i.e. continue() should be the last thing done before the next turn of the loop.

levelname

Return the level name for the most recent invocation of the test() method.

levelnum

Return the level number for the most recent invocation of the test() method.

level_numbers

Return a list of level numbers corresponding to the levels defined in new(). This can be useful, for example, when you want to set up some lexical variables for use as indexes into a list you might use to accumulate subtotals.

my $cb = ControlBreak->new( qw( L1 L2 EOD ) );
my @totals;
my ($L1, $L2, $EOD) = $cb->level_numbers;

foreach my $sublist (@list_of_lists) {
    my ($control1, $control2, $number) = $sublist->@*;
    ...
    my $sub_totals = sub {
        if ($cb->break('L1')) {
            # report the L1 subtotal here
            $totals[$L1] = 0; # clear the subtotal
        }
        ...
        # accumulate subtotals
        map { $totals[$_] += $number } $cb->level_numbers;
    };

    $cb->test_and_do(
        $control1,
        $control2,
        $cb->iteration == $list_of_lists - 1,
        $sub_totals
    );
}

reset

Resets the state of the object so it can be used again for another set of iterations using the same number and type of controls establish when the object was instantiated with new(). Any comparisons that were subsequently modified are retained.

test ( $var1 [, $var2 ]... )

Submits the control variables for testing against the values from the previous iteration.

Testing is done in reverse order, from highest to lowest (major to minor) and stops once a change is detected. Where it stops determines the control break level. For example, if $var2 changed, method levelnum will return 2. If $var2 did not change, but $var1 did, then method levelnum() will return 1. If nothing changes, then levelnum() will return 0.

Note that the level numbers set by test(...) are true if there was a level change, and false if there wasn't. So, they can be used as a simple boolean test of whether there was a change. Or you can use the break() method to determine whether any control break has occurred.

Because level numbers correspond to the hierarchical data order, they can be use to trigger multiple actions; e.g. levelnum() >= 1 could be used to print subtotals for levels 1 whenever a control break occurred for level 1, 2 or 3. It is usually the case that higher control breaks are meant to cascade to lower control levels and this can be achieved in this fashion. The break() method simplifies this.

Note that method continue() must be called at the end of each iteration in order to save the values of the iteration for the next iteration. If not, the next test(...) invocation will croak.

test_and_do ( $var1 [, $var2 ]... $var_end, $coderef )

The test_and_do() method is similar to test(). It takes the same arguments as test(), plus one additional argument that is an anonymous code reference. Internally, it calls test() and then, if there is a control break, calls the anonymous subroutine provided in the last argument. Typically, that code will perform work related to subtotals or other actions necessary when a control break occurs.

But test_and_do() does one other thing. It expects the last control variable ($var_end) to be an end of data indicator, such as the perl builtin operator eof. This indicator should return false on each iteration over the data until the very last iteration -- when it should change to true, thereby triggering a major control break.

What test_and_do does then is to add an extra loop. This simulates a final record and will trigger test() to signal control breaks at all levels. Thus, the code provided will be executed between every change of data AND after all data has been iterated over.

This avoids the necessity of repeating the control break actions you've put inside the data loop immediately after the loop's closing bracket. When you just use test and continue(), an end-of-data control break won't occur and the simplest workaround is to just duplicate your control break code after the loops closing bracket.

Here's a typical use case involving end of file processing. Note the extra control level, named 'EOF', and the use of the eof builtin function as the second last argument of test_and_do():

my $cb = ControlBreak->new( qw( L1 L2 EOF ) );

my $lev1_subtotal = 0;
my $lev2_subtotal = 0;
my $grand_total = 0;

while (my $line = <>) {
    chomp $line;

    my ($lev1, $lev2, $data) = split "\t", $line;

    my $subtotal_coderef = sub {
        if ($cb->break('L1')) {
            say $cb->last('L1'), $cb->last('L2'), $lev1_subtotal . '*';
            $lev1_subtotal = 0;
        }
        ...
        if ($cb->break('EOF')) {
            say 'Grand total,,', $grand_total, '***';
        }

        $lev1_subtotal  += $data;
        $lev2_subtotal  += $data;
        $gran_total     += $data;
    }

    $cb->test_and_do($lev1, $lev2, eof, $subtotal_coderef);
}

Also note that if your subroutine needs to reference variables defined outside the scope of the loop (as in this case with the totalling variables) then it needs to be defined within the loop so it can be a closure over the variables in the enclosing scope.

Another typical use case involves iterating over a list of values. Here, we have no built in function to tell us when we've reached the final value, but if we have a fixed list of values we can use the length of the list and test it against the value returned by the ControlBreak iterator method. For example:

my $cb = ControlBreak->new( qw( L1 L2 EOD ) );

my $lev1_subtotal = 0;
my $lev2_subtotal = 0;
my $grand_total = 0;

my $last_iter = @data - 1;

foreach my $line (@data {
    chomp $line;
    my ($lev1, $lev2, $data) = split "\t", $line;

    my $subtotal_coderef = sub {
        if ($cb->break('L1')) {
            say $cb->last('L1'), $cb->last('L2'), $lev1_subtotal . '*';
            $lev1_subtotal = 0;
        }
        ...
        if ($cb->break('EOD')) {
            say 'Grand total,,', $grand_total, '***';
        }

        $lev1_subtotal  += $data;
        $lev2_subtotal  += $data;
        $gran_total     += $data;
    }

    $cb->test_and_do($lev1, $lev2, $cb->iteration == $last_iter, $subtotal_coderef);
}

AUTHOR

Gary Puckering <jgpuckering@rogers.com>

LICENSE AND COPYRIGHT

Copyright 2022, Gary Puckering

This utility is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See https://dev.perl.org/licenses/artistic.html