NAME
Business::CompanyDesignator - module for matching and stripping/manipulating the company designators appended to company names
VERSION
Version: 0.08.
This module is considered an ALPHA release. Interfaces may change and/or break without notice until the module reaches version 1.0.
SYNOPSIS
Business::CompanyDesignator is a perl module for matching and stripping/manipulating the typical company designators appended (or sometimes, prepended) to company names. It supports both long forms (e.g. Corporation, Incorporated, Limited etc.) and abbreviations (e.g. Corp., Inc., Ltd., GmbH etc).
use Business::CompanyDesignator;
# Constructor
$bcd = Business::CompanyDesignator->new;
# Optionally, you can provide your own company_designator.yml file, instead of the bundled one
$bcd = Business::CompanyDesignator->new(datafile => '/path/to/company_designator.yml');
# Get lists of designators, which may be long (e.g. Limited) or abbreviations (e.g. Ltd.)
@des = $bcd->designators;
@long = $bcd->long_designators;
@abbrev = $bcd->abbreviations;
# Lookup individual designator records (returns B::CD::Record objects)
# Lookup record by long designator (unique)
$record = $bcd->record($long_designator);
# Lookup records by abbreviation or long designator (may not be unique)
@records = $bcd->records($designator);
# Get a regex for matching designators by type ('end'/'begin') and lang
# By default, returns 'end' regexes for all languages
$re = $bcd->regex;
$company_name =~ $re and say 'designator found!';
$company_name =~ /$re\s*$/ and say 'final designator found!';
my $re_begin_en = $bcd->regex('begin', 'en');
# Split $company_name on designator, returning a ($before, $designator, $after) triplet,
# plus the normalised form of the designator matched (can pass to records(), for example)
($before, $des, $after, $normalised_des) = $bcd->split_designator($company_name);
# Or in scalar context, return a L<Business::CompanyDesignator::SplitResult> object
$res = $bcd->split_designator($company_name, lang => 'en');
print join ' / ', $res->designator_std, $res->short_name, $res->extra;
DATASET
Business::CompanyDesignator uses the company designator dataset from here:
L<https://github.com/ProfoundNetworks/company_designator>
which is bundled with the module. You can use your own (updated or custom) version, if you prefer, by passing a 'datafile' parameter to the constructor.
The dataset defines multiple long form designators (like "Company", "Limited", or "Incorporée"), each of which have zero or more abbreviations (e.g. 'Co.', 'Ltd.', 'Inc.' etc.), and one or more language codes. The 'Company' entry, for instance, looks like this:
Company:
abbr:
- Co.
- '& Co.'
- and Co.
lang: en
Long designators are unique across the dataset, but abbreviations are not e.g. 'Inc.' is used for both "Incorporated" and "Incorporée".
METHODS
new()
Creates a Business::CompanyDesignator object.
$bcd = Business::CompanyDesignator->new;
By default this uses the bundled company_designator dataset. You may provide your own (updated or custom) version by passing via a 'datafile' parameter to the constructor.
$bcd = Business::CompanyDesignator->new(datafile => '/path/to/company_designator.yml');
designators()
Returns the full list of company designator strings from the dataset (both long form and abbreviations).
@designators = $bcd->designators;
long_designators()
Returns the full list of long form designators from the dataset.
@long = $bcd->long_designators;
abbreviations()
Returns the full list of abbreviation designators from the dataset.
@abbrev = $bcd->abbreviations;
record($long_designator)
Returns the Business::CompanyDesignator::Record object for the given long designator (and dies if not found).
records($designator)
Returns a list of Business::CompanyDesignator::Record objects for the given abbreviation or long designator (for long designators there will only be a single record returned, but abbreviations may map to multiple records).
Use this method for abbreviations, or if you're aren't sure of a designator's type.
regex([$type], [$lang])
Returns a regex for all matching designators for $type ('begin'/'end') and $lang (iso 639-1 language code e.g. 'en', 'es', de', etc.) from the dataset. $lang may be either a single language code scalar, or an arrayref of language codes, for multiple alternative languages. The returned regex is case-insensitive and non-anchored.
$type defaults to 'end', so without parameters regex() returns a regex matching all designators for all languages.
split_designator($company_name)
Attempts to split $company_name on (the first) company designator found.
In array context split_designator returns a list of four items - a triplet of strings from $company_name ( $before, $designator, $after ), plus the standardised version of the designator as a fourth element.
($short_name, $des, $after_text, $des_std) = $bcd->split_designator($company_name);
In scalar context split_designator returns a Business::CompanyDesignator::SplitResult object.
$res = $bcd->split_designator($company_name, lang => $lang);
The initial $des designator (or $res->designator) is the designator text as matched in $company_name, while the final $des_std in array context (or $res->designator_std) is the standardised version as found in the dataset.
For instance, "ABC Pty Ltd" would return "Pty Ltd" as the $designator, but "Pty. Ltd." as the stardardised form, and the latter would be what you would find in designators() or would lookup with records(). Similarly, "Accessoires XYZ Ltee" (without the french acute) would match, returning "Ltee" (as found) for the $designator, but "Ltée" as the standardised form.
split_designator also accepts an optional 'lang' parameter, which defines one or more ISO 639-1 language codes for $company_name (can be either a single scalar language code, or an arrayref of alternate language codes). If $lang is defined, split_designator will only match designators for those languages, which can improve the accuracy of the split.
Note that split_designator won't always get the split right. It checks for final designators first, then leading ones, and then finally looks for embedded designators. This allows names like these to picked up:
Amerihealth Insurance Company of NJ
Trenkwalder Personal AG Schweiz
Vicente Campano S L (COMERCIAL VICAM)
Gvozdika, gostinitsa OOO ""Eko-Treyd""
but it can also detect designators that are false positives e.g.
Dr S L Ledingham - Beaumont Street Practice
One way to mitigate this is to specify the optional 'lang' parameter if you know the language(s) your company names are in. This should reduce the number of false positives by making fewer designators available to match on, but it doesn't eliminate the issue altogether.
SEE ALSO
Finance::CompanyNames
AUTHOR
Gavin Carr <gavin@profound.net>
COPYRIGHT AND LICENCE
Copyright (C) 2013-2015 Gavin Carr
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.