NAME

Lingua::Word::Parser - Parse a word into scored known and unknown parts

VERSION

version 0.0809

SYNOPSIS

use Lingua::Word::Parser;

# With a database source:
my $p = Lingua::Word::Parser->new(
  word   => 'abioticaly',
  dbname => 'fragments',
  dbuser => 'akbar',
  dbpass => 's3kr1+',
);

# With a file source:
$p = Lingua::Word::Parser->new(
  word => 'abioticaly',
  file => 'eg/lexicon.dat',
);

my $known  = $p->knowns;
my $combos = $p->power;
my $score  = $p->score;       # Stringified output
$score     = $p->score_parts; # "Raw" output

# The best guess is the last sorted scored set:
print Dumper $score->{ [ sort keys %$score ]->[-1] };

DESCRIPTION

A Lingua::Word::Parser breaks a word into known affixes.

A word-part lexicon file must have "regular-expression definition" lines of the form:

a(?=\w)        opposite
ab(?=\w)       away
(?<=\w)o(?=\w) combining
(?<=\w)tic     possessing

Please see the included eg/lexicon.dat example file.

A database lexicon must have records as above, but with the column names, id, affix and definition. Please see the included eg/word_part.sql example file.

METHODS

new

$p = Lingua::Word::Parser->new(%arguments);

Create a new Lingua::Word::Parser object.

Arguments and defaults:

word:   undef
dbuser: undef
dbpass: undef
dbname: undef
dbtype: mysql
dbhost: localhost

knowns

$known = $p->knowns;

Find the known word parts and their bitstring masks.

power

$combos = $p->power;

Find the set of non-overlapping known word parts by considering the power set of all masks.

score

$score = $p->score;
$score = $p->score( $open_separator, $close_separator);

Score the known vs unknown word part combinations into ratios of characters and chunks, word familiarity, partitions and definitions.

This method sets the score member to a list of hashrefs with keys:

partition
definition
score
familiarity

If not given, the $open_separator and $close_separator are '<' and '>' by default.

score_parts

$score_parts = $p->score_parts;
$score_parts = $p->score_parts( $open_separator, $close_separator );
$score_parts = $p->score_parts( $open_separator, $close_separator, $line_terminator );

Score the known vs unknown word part combinations into ratios of characters and chunks, word familiarity, partitions and definitions.

If not given, the $open_separator and $close_separator are '<' and '>' by default.

The $line_terminator can be any string, like a newline (\n or an HTML line-break), but is the empty string ('') by default.

AUTHOR

Gene Boggs <gene@cpan.org>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Lingua::Word::Parser, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::Word::Parser

CPAN shell

perl -MCPAN -e shell
install Lingua::Word::Parser

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)