NAME

Genealogy::Gedcom::Reader::Lexer - An OS-independent lexer for GEDCOM data

Synopsis

Run scripts/lex.pl -help.

A typical run would be:

perl -Ilib scripts/lex.pl -i data/royal.ged -r 1 -s 1

Turn on debugging prints with:

perl -Ilib scripts/lex.pl -i data/royal.ged -r 1 -s 1 -max debug

royal.ged was downloaded from http://www.vjet.f2s.com/ftree/download.html. It's more up-to-date than the one shipped with Gedcom.

Various sample GEDCOM files may be found in the data/ directory in the distro.

Description

Genealogy::Gedcom::Reader::Lexer provides a lexer for GEDCOM data.

See the GEDCOM Specification Ged551-5.pdf.

Installation

Install Genealogy::Gedcom as you would for any Perl module:

Run:

cpanm Genealogy::Gedcom

or run:

sudo cpan Genealogy::Gedcom

or unpack the distro, and then either:

perl Build.PL
./Build
./Build test
sudo ./Build install

or:

perl Makefile.PL
make (or dmake or nmake)
make test
make install

Constructor and Initialization

new() is called as my($lexer) = Genealogy::Gedcom::Reader::Lexer -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type Genealogy::Gedcom::Reader::Lexer.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. input_file()]):

o input_file => $gedcom_file_name

Read the GEDCOM data from this file.

Default: ''.

o logger => $logger_object

Specify a logger object.

To disable logging, just set logger to the empty string.

Default: An object of type Log::Handler.

o maxlevel => $level

This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.

Default: 'info'.

Log levels are, from highest (i.e. most output) to lowest: 'debug', 'info', 'warning', 'error'. No lower levels are used.

o minlevel => $level

This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.

Default: 'error'.

o report_items => $Boolean
o 0 => Report nothing
o 1 => Call "report()" to report, via the log, the items recognized by the lexer

This output is at log level 'info'.

Default: 0.

o strict => $Boolean

Specifies lax or strict string length checking during validation.

o 0 => String lengths can be 0, allowing blank NOTE etc records
o 1 => String lengths must be > 0, as per the GEDCOM Specification

Note: A string of length 1 - e.g. '0' - might still be an error.

Default: 0.

The upper lengths on strings are always as per the GEDCOM Specification. See "get_max_length($id, $line)" for details.

String lengths out of range (as with all validation failures) are reported as log messages at level 'warning'.

Methods

check_date($id, $line)

Checks the date field in the input arrayref $line, $$line[4].

$id identifies what type of record the $line is expected to be.

check_length($id, $line)

Checks the length of the data component (after the tag) on the input arrayref $line, $$line[4].

$id identifies what type of record the $line is expected to be.

cross_check_xrefs

Ensure that all xrefs point to existing records.

See "What validation is performed?" in FAQ for details.

get_gedcom_from_file()

If the caller has requested GEDCOM data be read from a file, with the input_file option to new(), this method reads that file.

Called as appropriate by "run()", if you do not suppy data with "gedcom_data([$gedcom_data])".

gedcom_data([$gedcom_data])

The [] indicate an optional parameter.

Get or set the arrayref of GEDCOM records to be processed.

This is normally only used internally, but can be used to bypass reading from a file.

Note: If supplying data this way rather than via the file, you must strip newlines etc on every line, as well as leading and trailing blanks.

get_max_length($id, $line)

Get the maximum string length of the data component (after the tag) on the given $line.

$id identifies what type of record the $line is expected to be.

get_min_length($id, $line)

Get the minimum string length of the data component (after the tag) on the given $line.

Currently, this value is actually the value of strict(), i.e. 0 or 1.

$id identifies what type of record the $line is expected to be.

input_file([$gedcom_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to read the GEDCOM data from.

items()

Returns a object of type Set::Array, which is an arrayref of items output by the lexer.

See the "FAQ" for details.

log($level, $s)

Calls $self -> logger -> $level($s).

logger([$logger_object])

Here, the [] indicate an optional parameter.

Get or set the logger object.

To disable logging, just set logger to the empty string.

maxlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.

minlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.

push_item($line, $type)

Pushes a hashref of components of the $line, with type $type, onto the arrayref of items returned by "items()".

See the "FAQ" for details.

renumber_items()

Scan the arrayref of hashrefs returned by items() and ensure the 'count' field is ok.

This is done in case array elements have been combined, e.g. when processing CONCs and CONTs for NOTEs.

report()

Report, via the log, the list of items recognized by the lexer.

report_items([0 or 1])

The [] indicate an optional parameter.

Get or set the value which determines whether or not to report the items recognised by the lexer.

run()

This is the only method the caller needs to call. All parameters are supplied to new(), or via previous calls to various methods.

Returns 0 for success and 1 for failure.

strict([0 or 1])

The [] indicate an optional parameter.

Get or set the value which determines whether or not to use 0 or 1 as the minimum string length.

FAQ

See "FAQ" in Genealogy::Gedcom.

Repository

https://github.com/ronsavage/Genealogy-Gedcom

See Also

Genealogy::Gedcom::Date.

Gedcom::Date.

Machine-Readable Change Log

The file Changes was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

References

o The original Perl Gedcom
o GEDCOM
o GEDCOM Specification
o GEDCOM Validation
o GEDCOM Tags
o Usage of non-standard tags
o http://www.tamurajones.net/FTWTEXT.xhtml

This is apparently the worst offender he's seen. Search that page for 'tags'.

o http://www.tamurajones.net/GenoPro2011.xhtml
o http://www.tamurajones.net/GenoPro2007.xhtml
o http://www.tamurajones.net/TheFTWTEXTProblem.xhtml
o http://www.tamurajones.net/FiveFreakyFeaturesYourGenealogySoftwareShouldNotHave.xhtml
o http://www.tamurajones.net/TwelveOrdinaryMustHaveGenealogySoftwareFeatures.xhtml
o Other projects

Many of these are discussed on Tamura's site.

o http://bettergedcom.wikispaces.com/
o http://www.ngsgenealogy.org/cs/GenTech_Projects
o http://gdmxml.fugal.net/
o http://www.cosoft.org/genxml/
o http://www.sunflower.com/~billk/GEDC/
o http://ancestorsnow.blogspot.com/2011/07/vged.html
o http://www.tamurajones.net/GEDCOMValidation.xhtml
o http://webtrees.net/
o http://swoodbridge.com/Genealogy/lifelines/
o http://deadendssoftware.blogspot.com/
o http://www.legacyfamilytree.com/
o https://devnet.familysearch.org/docs/api-overview

The Gedcom Mailing List

Contact perl-gedcom-help@perl.org.

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom.

Author

Genealogy::Gedcom::Reader::Lexer was written by Ron Savage <ron@savage.net.au> in 2011.

Home page: http://savage.net.au/index.html.

Copyright

Australian copyright (c) 2011, Ron Savage.

All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Perl License, a copy of which is available at:
http://dev.perl.org/licenses/