NAME
WordLists::Common
SYNOPSIS
use WordLists::Common qw(pretty_doubles pretty_singles);
print pretty_doubles (pretty_singles (
qq{"That's right," she said, "I was told to 'get lost!'".}
) );
DESCRIPTION
This provides common functions and values of relevance to wordlists - such as normalising parts of speech and typographic dashes and quotes. Exportable functions and values include:
@sPosWords
, a list of things which look like parts of speech (to help parsing things like "head verb", "head up", "head noun")A function
pretty_endash
replacing space + hyphen + space with space + en-dash + space.A function
pretty_doubles
replacing double quotes with 'smart' double quotes.A function
pretty_singles
replacing apostrophe/single-quote with 'smart' single quotes.A function
norm_spacing
A function
custom_norm
which takes several options:lc
- if true, lowercases the string.uc
- if true, uppercases the string. Overrideslc
.trim_space
- if true, removes initial and final space, and also condenses repeating white space to a single\x20
.alnum_only
- if true, removes characters other than alphabetic ones or digits.brackets
- if this is 'kill', removes the contents of any()
brackets; if 'ignore', removes the brackets themselves.squares
- if this is 'kill', removes the contents of any[]
brackets; if 'ignore', removes the brackets themselves.accents
- if true, removes accents and modifier characters from letters.sb
- if true, replaces 'sb' with 'someone'.sth
- if true, replaces 'sth' with 'something'.
A function
generic_norm_hw
which returns a word without accents or characters other than [a-z0-9].A function
generic_norm_pos
for normalising parts of speech so that 'v' and 'verb' match.A function
generic_minimal_pos
which will normalise parts of speech and reduce them to 'minimal' ones.A function
uniques
which will reduce a list to the unique members.
BUGS
Please use the Github issues tracker.
LICENSE
Copyright 2011-2012 © Cambridge University Press. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.