NAME

Regexp::Compare - partial ordering for regular expressions

SYNOPSIS

use Regexp::Compare qw(is_less_or_equal);

if (is_less_or_equal($rx[i], $rx[j])) {
    print "duplicate: $rx[i]\n";
}

DESCRIPTION

This module implements a function comparing regular expressions: it returns true if all strings matched by the first regexp are also matched by the second. It's meant to be used for optimization of blacklists implemented by regular expressions (like, for example, http://www.communitywiki.org/cw/BannedContent ).

Both arguments of is_less_or_equal are strings - IOW the call

$rv = is_less_or_equal($rx, /hardcoded/i);

probably won't do what you want - use

$rv = is_less_or_equal($rx, '(?i:hardcoded)');

instead.

False return value does not imply that there's a string matched by the first regexp which isn't matched by the second - many regular expressions (i.e. those containing Perl code) are impossible to compare, and this module doesn't even implement all possible comparisons.

BUGS

  • EBCDIC-based platforms not supported

  • comparison of character classes is simplified and probably has some incorrect corner cases

  • comparison fails for locale-specific constructs

  • comparison fails for regexps with backreferences

  • global variables affecting regexp matching are ignored

  • function may die for unusual (legal but unexpected) regexp constructs

AUTHOR

Vaclav Barta, <vbarta@mangrove.cz>

COPYRIGHT AND LICENSE

Copyright (C) 2006 - 2021 by Vaclav Barta

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.30.0 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

Rx