NAME

Regexp::Chinese::TradSimp - Take a string containing Chinese text, and turn it into a traditional-simplified-insensitive regexp.

SYNOPSIS

#!/usr/bin/perl -w
use strict;
use utf8;

my $regexp = Regexp::Chinese::TradSimp->make_regexp( "鳳爪" );

my $text = "豉汁蒸凤爪";
if ( $text =~ $regexp ) {
  print "Chicken feet detected!\n";
}

# Alternatively:
my $tradsimp = Regexp::Chinese::TradSimp->new;
my $regexp = $tradsimp->make_regexp( "鳳爪" );

DESCRIPTION

Given a string containing Chinese text, transforms it into a regexp that can be used to match both the simplified and the traditional version of the text. The distribution also includes a commandline tool, dets (desensitise traditional-simplified).

METHODS

make_regexp
# This returns /[凤鳳]爪/.
my $regexp = Regexp::Chinese::TradSimp->make_regexp( "鳳爪" );

# This returns /[水虾蝦][饺餃]/
my $regexp = Regexp::Chinese::TradSimp->make_regexp( "[水蝦]餃" );

# This returns /([虾蝦]|[带帶]子)[饺餃]/
my $regexp = Regexp::Chinese::TradSimp->make_regexp( "(虾|带子)饺" );

make_regexp attempts to create a regular expression that will match its argument in a traditional-simplified-insensitive way. The argument should be a string of Chinese characters, but you can include certain other aspects of regular expressions such as character classes and bracketed groupings. Arguments of forms other than those shown above are not guaranteed to work.

desensitise

Does exactly the same as make_regexp but returns a string instead of a regexp, e.g. "[凤鳳]爪" rather than /[凤鳳]爪/.

We are also -ise/-ize agnostic:

# These do the same thing.
my $regexp = $tradsimp->desensitise( qr/叉燒包/ );
my $regexp = $tradsimp->desensitize( qr/叉燒包/ );

AUTHOR

Kake L Pugh <kake@earth.li>

COPYRIGHT

Copyright (C) 2010 Kake L Pugh. All Rights Reserved.

This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.