NAME

Set::IntSpan - Manages sets of integers

SYNOPSIS

use Set::IntSpan qw(grep_set map_set grep_spans map_spans);

$Set::IntSpan::Empty_String = $string;

$set    = new   Set::IntSpan $set_spec;
$set    = new   Set::IntSpan @set_specs;
$valid  = valid Set::IntSpan $run_list;
$set    = copy  $set $set_spec;

$run_list = run_list $set;
@elements = elements $set;
@sets     = sets     $set;
@spans    = spans    $set;

$u_set = union      $set $set_spec;
$i_set = intersect  $set $set_spec;
$x_set = xor        $set $set_spec;
$d_set = diff       $set $set_spec;
$c_set = complement $set;

equal      $set $set_spec;
equivalent $set $set_spec;
superset   $set $set_spec;
subset     $set $set_spec;

$n = cardinality $set;
$n = size        $set;

empty      $set;
finite     $set;
neg_inf    $set;
pos_inf    $set;
infinite   $set;
universal  $set;

member     $set $n;
insert     $set $n;
remove     $set $n;

$min = min $set;
$max = max $set;

$holes   = holes $set;
$cover   = cover $set;
$inset   = inset $set $n;
$smaller = trim  $set $n;
$bigger  = pad   $set $n;

$subset  = grep_set 	{ ... } $set;
$mapset  = map_set  	{ ... } $set;

$subset  = grep_spans { ... } $set;
$mapset  = map_spans  { ... } $set;

for ($element=$set->first; defined $element; $element=$set->next) { ... }
for ($element=$set->last ; defined $element; $element=$set->prev) { ... }

$element = $set->start($n);
$element = $set->current;

$n       = $set->at($i);
$slice   = $set->slice($from, $to);

EXPORTS

`@EXPORT`

Nothing

`@EXPORT_OK`

grep_set, map_set, grep_spans, map_spans

DESCRIPTION

Set::IntSpan manages sets of integers. It is optimized for sets that have long runs of consecutive integers. These arise, for example, in .newsrc files, which maintain lists of articles:

alt.foo: 1-21,28,31
alt.bar: 1-14192,14194,14196-14221

Sets are stored internally in a run-length coded form. This provides for both compact storage and efficient computation. In particular, set operations can be performed directly on the encoded representation.

Set::IntSpan is designed to manage finite sets. However, it can also represent some simple infinite sets, such as {x | x>n}. This allows operations involving complements to be carried out consistently, without having to worry about the actual value of INT_MAX on your machine.

SET SPECIFICATIONS

Many of the methods take a set specification. There are four kinds of set specifications.

Empty

If a set specification is omitted, then the empty set is assumed. Thus,

$set = new Set::IntSpan;

creates a new, empty set. Similarly,

copy $set;

removes all elements from $set.

Object reference

If an object reference is given, it is taken to be a Set::IntSpan object.

Array reference

If an array reference is given, then the elements of the array are taken to be the elements of the set. The array may contain duplicate elements. The elements of the array may be in any order.

Run list

If a string is given, it is taken to be a run list. A run list specifies a set using a syntax similar to that in newsrc files.

A run list is a comma-separated list of runs. Each run specifies a set of consecutive integers. The set is the union of all the runs.

Runs may be written in any of several forms.

Finite forms

n: { n }
a-b: {x | a<=x && x<=b}

Infinite forms

(-n: {x | x<=n}
n-): {x | x>=n}
(-): The set of all integers

Empty forms

The empty set is consistently written as '' (the null string). It is also denoted by the special form '-' (a single dash).

Restrictions

The runs in a run list must be disjoint, and must be listed in increasing order.

Valid characters in a run list are 0-9, '(', ')', '-' and ','. White space and underscore (_) are ignored. Other characters are not allowed.

Examples

-: { }
1: { 1 }
1-2: { 1, 2 }
-5--1: { -5, -4, -3, -2, -1 }
(-): the integers
(--1: the negative integers
1-3, 4, 18-21: { 1, 2, 3, 4, 18, 19, 20, 21 }

ITERATORS

Each set has a single iterator, which is shared by all calls to first, last, start, next, prev, and current. At all times, the iterator is either an element of the set, or undef.

first, last, and start set the iterator; next, and prev move it; and current returns it. Calls to these methods may be freely intermixed.

Using next and prev, a single loop can move both forwards and backwards through a set. Using start, a loop can iterate over portions of an infinite set.

METHODS

Creation

$set = new Set::IntSpan $set_spec

$set = new Set::IntSpan @set_specs

Creates and returns a Set::IntSpan object.

The initial contents of the set are given by $set_spec, or by the union of all the @set_specs.

$ok = valid Set::IntSpan $run_list

Returns true if $run_list is a valid run list. Otherwise, returns false and leaves an error message in $@.

$set = copy $set $set_spec

Copies $set_spec into $set. The previous contents of $set are lost. For convenience, copy returns $set.

$run_list = run_list $set

Returns a run list that represents $set. The run list will not contain white space. $set is not affected.

By default, the empty set is formatted as '-'; a different string may be specified in $Set::IntSpan::Empty_String.

@elements = elements $set

Returns an array containing the elements of $set. The elements will be sorted in numerical order. In scalar context, returns an array reference. $set is not affected.

@sets = sets $set

Returns the runs in $set, as a list of Set::IntSpan objects. The sets in the list are in order.

@spans = spans $set

Returns the runs in $set, as a list of the form

([$a1, $b1], 
 [$a2, $b2],
 ...
 [$aN, $bN])

If a run contains only a single integer, then the upper and lower bounds of the corresponding span will be equal.

If the set has no lower bound, then $a1 will be undef. Similarly, if the set has no upper bound, then $bN will be undef.

The runs in the list are in order.

Set operations

$u_set = union $set $set_spec: Returns the set of integers in either $set or $set_spec.
$i_set = intersect $set $set_spec: Returns the set of integers in both $set and $set_spec.
$x_set = xor $set $set_spec: Returns the set of integers in $set or $set_spec, but not both.
$d_set = diff $set $set_spec: Returns the set of integers in $set but not in $set_spec.
$c_set = complement $set: Returns the set of integers that are not in $set.

For all set operations, a new Set::IntSpan object is created and returned. The operands are not affected.

Comparison

equal $set $set_spec: Returns true iff $set and $set_spec contain the same elements.
equivalent $set $set_spec: Returns true iff $set and $set_spec contain the same number of elements. All infinite sets are equivalent.
superset $set $set_spec: Returns true iff $set is a superset of $set_spec.
subset $set $set_spec: Returns true iff $set is a subset of $set_spec.

Cardinality

$n = cardinality $set
$n = size $set: Returns the number of elements in $set. Returns -1 for infinite sets. size is provided as an alias for cardinality.
empty $set: Returns true iff $set is empty.
finite $set: Returns true iff $set is finite.
neg_inf $set: Returns true iff $set contains {x | x<n} for some n.
pos_inf $set: Returns true iff $set contains {x | x>n} for some n.
infinite $set: Returns true iff $set is infinite.
universal $set: Returns true iff $set contains all integers.

Membership

member $set $n: Returns true iff the integer $n is a member of $set.
insert $set $n: Inserts the integer $n into $set. Does nothing if $n is already a member of $set.
remove $set $n: Removes the integer $n from $set. Does nothing if $n is not a member of $set.

Extrema

min $set: Returns the smallest element of $set, or undef if there is none.
max $set: Returns the largest element of $set, or undef if there is none.

Spans

$holes = holes $set

Returns a set containing all the holes in $set, that is, all the integers that are in-between spans of $set.

holes is always a finite set.

$cover = cover $set

Returns a set consisting of a single span from $set->min to $set->max. This is the same as

union $set $set->holes

$inset = inset $set $n

$smaller = trim $set $n

$bigger = pad $set $n

inset returns a set constructed by removing $n integers from each end of each span of $set. If $n is negative, then -$n integers are added to each end of each span.

In the first case, spans may vanish from the set; in the second case, holes may vanish.

trim is provided as a synonym for inset.

pad $set $n is the same as inset $set -$n.

Iterators

$set->first

Sets the iterator for $set to the smallest element of $set. If there is no smallest element, sets the iterator to undef. Returns the iterator.

$set->last

Sets the iterator for $set to the largest element of $set. If there is no largest element, sets the iterator to undef. Returns the iterator.

$set->start($n)

Sets the iterator for $set to $n. If $n is not an element of $set, sets the iterator to undef. Returns the iterator.

$set->next

Sets the iterator for $set to the next element of $set. If there is no next element, sets the iterator to undef. Returns the iterator.

next will return undef only once; the next call to next will reset the iterator to the smallest element of $set.

$set->prev

Sets the iterator for $set to the previous element of $set. If there is no previous element, sets the iterator to undef. Returns the iterator.

prev will return undef only once; the next call to prev will reset the iterator to the largest element of $set.

$set->current

Returns the iterator for $set.

Indexing

The elements of a set are kept in numerical order. These methods index into the set based on this ordering.

$n = $set->at($i)

Returns the $ith element of $set, or undef if there is no $ith element. Negative indices count backwards from the end of the set.

Dies if

$i is non-negative and $set is neg_inf
$i is negative and $set is pos_inf

$slice = $set->slice($from, $to)

Returns a Set::IntSpan object containing the elements of $set at indices $from..$to. Negative indices count backwards from the end of the set.

Dies if

$from is non-negative and $set is neg_inf
$from is negative and $set is pos_inf

FUNCTIONS

$sub_set = grep_set { ... } $set

Evaluates the BLOCK for each integer in $set (locally setting $_ to each integer) and returns a Set::IntSpan object containing those integers for which the BLOCK returns TRUE.

Returns undef if $set is infinite.

$map_set = map_set { ... } $set

Evaluates the BLOCK for each integer in $set (locally setting $_ to each integer) and returns a Set::IntSpan object containg all the integers returned as results of all those evaluations.

Evaluates the BLOCK in list context, so each element of $set may produce zero, one, or more elements in the returned set.

Returns undef if $set is infinite.

$sub_set = grep_spans { ... } $set

Evaluates the BLOCK for each span in $set and returns a Set::IntSpan object containing those spans for which the BLOCK returns TRUE.

Within BLOCK, $_ is locally set to an array ref of the form

[ $lower, $upper ]

where $lower and $upper are the bounds of the span. If the span contains only one integer, then $lower and $upper will be equal. If the span is unbounded, then the corresponding element(s) of the array will be undef.

$map_set = map_spans { ... } $set

Evaluates the BLOCK for each span in $set, and returns a Set::IntSpan object consisting of the union of all the spans returned as results of all those evaluations.

Within BLOCK, $_ is locally set to an array ref of the form

[ $lower, $upper ]

as described above for grep_spans. Each evaulation of BLOCK must return a list of array refs of the same form. Each returned list may contain zero, one, or more spans. Spans may be returned in any order, and need not be disjoint. However, for each bounded span, the constraint

$lower <= $upper

must hold.

CLASS VARIABLES

$Set::IntSpan::Empty_String

$Set::IntSpan::Empty_String contains the string that is returned when run_list is called on the empty set. $Empty_String is initially '-'; alternatively, it may be set to ''. Other values should be avoided, to ensure that run_list always returns a valid run list.

run_list accesses $Empty_String through a reference stored in $set->{empty_string}. Subclasses that wish to override the value of $Empty_String can reassign this reference.

DIAGNOSTICS

Any method (except valid) will die if it is passed an invalid run list.

Set::IntSpan::_copy_run_list: Bad syntax: $runList: (F) $run_list has bad syntax
Set::IntSpan::_copy_run_list: Bad order: $runList: (F) $run_list has overlapping runs or runs that are out of order.
Set::IntSpan::elements: infinite set: (F) An infinte set was passed to elements.
Set::IntSpan::at: negative infinite set: (F) at was called with a non-negative index on a negative infinite set.
Set::IntSpan::at: positive infinite set: (F) at was called with a negative index on a positive infinite set.
Set::IntSpan::slice: negative infinite set: (F) slice was called with $from non-negative on a negative infinite set.
Set::IntSpan::slice: positive infinite set: (F) slice was called with $from negative on a positive infinite set.
Out of memory!: (X) elements $set can generate an "Out of memory!" message on sufficiently large finite sets.

NOTES

Traps

Beware of forms like

union $set [1..5];

This passes an element of @set to union, which is probably not what you want. To force interpretation of $set and [1..5] as separate arguments, use forms like

union $set +[1..5];

$set->union([1..5]);

Cardinality

You cannot use the obvious comparison routine

{ $a->cardinality <=> $b->cardinality }

to sort sets by size, because cardinality returns -1 for infinte sets. (All the non-negative integers were taken. Sorry.)

Instead, you have to write something like

{  my $a_card = $a->cardinality;
   my $b_card = $b->cardinality;
   
   $a_card == $b_card and return  0;
   $a_card <  0       and return  1;
   $b_card <  0       and return -1;
   $a_card <=> $b_card                }

grep_set and map_set

grep_set and map_set make it easy to construct sets for which the internal representation used by Set::IntSpan is not small. Consider:

$billion = new Set::IntSpan '0-1_000_000_000';   # OK
$odd     = grep_set { $_ & 1 } $billion;         # trouble
$even    = map_set  { $_ * 2 } $billion;         # double trouble

Error handling

There are two common approaches to error handling: exceptions and return codes. There seems to be some religion on the topic, so Set::IntSpan provides support for both.

To catch exceptions, protect method calls with an eval:

$run_list = <STDIN>;
eval { $set = new Set::IntSpan $run_list };
$@ and print "$@: try again\n";

To check return codes, use an appropriate method call to validate arguments:

$run_list = <STDIN>;
if (valid Set::IntSpan $run_list) 
   { $set = new Set::IntSpan $run_list }
else
   { print "$@ try again\n" }

Similarly, use finite to protect calls to elements:

finite $set and @elements = elements $set;

Calling elements on a large, finite set can generate an "Out of memory!" message, which cannot (easily) be trapped. Applications that must retain control after an error can use intersect to protect calls to elements:

@elements = elements { intersect $set "-1_000_000 - 1_000_000" };

or check the size of $set first:

finite $set and cardinality $set < 2_000_000 and @elements = elements $set;

Limitations

Although Set::IntSpan can represent some infinite sets, it does not perform infinite-precision arithmetic. Therefore, finite elements are restricted to the range of integers on your machine.

Roots

The sets implemented here are based on a Macintosh data structure called a region. See Inside Macintosh for more information.

AUTHOR

Steven McDougall <swmcd@world.std.com>

ACKNOWLEDGMENTS

Malcolm Cook <mec@stowers-institute.org>
Martin Krzywinski <martink@bcgsc.ca>

COPYRIGHT

To install Set::IntSpan, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Set::IntSpan

CPAN shell

perl -MCPAN -e shell
install Set::IntSpan

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

EXPORTS

@EXPORT

@EXPORT_OK

DESCRIPTION

SET SPECIFICATIONS

Empty

Object reference

Array reference

Run list

Finite forms

Infinite forms

Empty forms

Restrictions

Examples

ITERATORS

METHODS

Creation

Set operations

Comparison

Cardinality

Membership

Extrema

Spans

Iterators

Indexing

FUNCTIONS

CLASS VARIABLES

DIAGNOSTICS

NOTES

Traps

Cardinality

grep_set and map_set

Error handling

Limitations

Roots

AUTHOR

ACKNOWLEDGMENTS

COPYRIGHT

Module Install Instructions

`@EXPORT`

`@EXPORT_OK`