NAME
Genealogy::Gedcom::Reader::Lexer - An OS-independent lexer for GEDCOM data
Synopsis
Run scripts/lex.pl -help.
A typical run would be:
perl -Ilib scripts/lex.pl -i data/royal.ged -r 1 -s 1
Turn on debugging prints with:
perl -Ilib scripts/lex.pl -i data/royal.ged -r 1 -s 1 -max debug
royal.ged was downloaded from http://www.vjet.f2s.com/ftree/download.html. It's more up-to-date than the one shipped with Gedcom.
Various sample GEDCOM files may be found in the data/ directory in the distro.
Description
Genealogy::Gedcom::Reader::Lexer provides a lexer for GEDCOM data.
See the GEDCOM Specification Ged551-5.pdf.
Installation
Install Genealogy::Gedcom as you would for any Perl
module:
Run:
cpanm Genealogy::Gedcom
or run:
sudo cpan Genealogy::Gedcom
or unpack the distro, and then either:
perl Build.PL
./Build
./Build test
sudo ./Build install
or:
perl Makefile.PL
make (or dmake or nmake)
make test
make install
Constructor and Initialization
new()
is called as my($lexer) = Genealogy::Gedcom::Reader::Lexer -> new(k1 => v1, k2 => v2, ...)
.
It returns a new object of type Genealogy::Gedcom::Reader::Lexer
.
Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. input_file()]):
- o input_file => $gedcom_file_name
-
Read the GEDCOM data from this file.
Default: ''.
- o logger => $logger_object
-
Specify a logger object.
To disable logging, just set logger to the empty string.
Default: An object of type Log::Handler.
- o maxlevel => $level
-
This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.
Default: 'info'.
Log levels are, from highest (i.e. most output) to lowest: 'debug', 'info', 'warning', 'error'. No lower levels are used.
- o minlevel => $level
-
This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.
Default: 'error'.
- o report_items => $Boolean
-
- o 0 => Report nothing
- o 1 => Call "report()" to report, via the log, the items recognized by the lexer
-
This output is at log level 'info'.
Default: 0.
- o strict => $Boolean
-
Specifies lax or strict string length checking during validation.
- o 0 => String lengths can be 0, allowing blank NOTE etc records
- o 1 => String lengths must be > 0, as per the GEDCOM Specification
-
Note: A string of length 1 - e.g. '0' - might still be an error.
Default: 0.
The upper lengths on strings are always as per the GEDCOM Specification. See "get_max_length($id, $line)" for details.
String lengths out of range (as with all validation failures) are reported as log messages at level 'warning'.
Methods
check_date($id, $line)
Checks the date field in the input arrayref $line, $$line[4].
$id identifies what type of record the $line is expected to be.
check_length($id, $line)
Checks the length of the data component (after the tag) on the input arrayref $line, $$line[4].
$id identifies what type of record the $line is expected to be.
cross_check_xrefs
Ensure that all xrefs point to existing records.
See "What validation is performed?" in FAQ for details.
get_gedcom_from_file()
If the caller has requested GEDCOM data be read from a file, with the input_file option to new(), this method reads that file.
Called as appropriate by "run()", if you do not suppy data with "gedcom_data([$gedcom_data])".
gedcom_data([$gedcom_data])
The [] indicate an optional parameter.
Get or set the arrayref of GEDCOM records to be processed.
This is normally only used internally, but can be used to bypass reading from a file.
Note: If supplying data this way rather than via the file, you must strip newlines etc on every line, as well as leading and trailing blanks.
get_max_length($id, $line)
Get the maximum string length of the data component (after the tag) on the given $line.
$id identifies what type of record the $line is expected to be.
get_min_length($id, $line)
Get the minimum string length of the data component (after the tag) on the given $line.
Currently, this value is actually the value of strict(), i.e. 0 or 1.
$id identifies what type of record the $line is expected to be.
input_file([$gedcom_file_name])
Here, the [] indicate an optional parameter.
Get or set the name of the file to read the GEDCOM data from.
items()
Returns a object of type Set::Array, which is an arrayref of items output by the lexer.
See the "FAQ" for details.
log($level, $s)
Calls $self -> logger -> $level($s).
logger([$logger_object])
Here, the [] indicate an optional parameter.
Get or set the logger object.
To disable logging, just set logger to the empty string.
maxlevel([$string])
Here, the [] indicate an optional parameter.
Get or set the value used by the logger object.
This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.
minlevel([$string])
Here, the [] indicate an optional parameter.
Get or set the value used by the logger object.
This option is only used if the lexer creates an object of type Log::Handler. See Log::Handler::Levels.
push_item($line, $type)
Pushes a hashref of components of the $line, with type $type, onto the arrayref of items returned by "items()".
See the "FAQ" for details.
renumber_items()
Scan the arrayref of hashrefs returned by items() and ensure the 'count' field is ok.
This is done in case array elements have been combined, e.g. when processing CONCs and CONTs for NOTEs.
report()
Report, via the log, the list of items recognized by the lexer.
report_items([0 or 1])
The [] indicate an optional parameter.
Get or set the value which determines whether or not to report the items recognised by the lexer.
run()
This is the only method the caller needs to call. All parameters are supplied to new(), or via previous calls to various methods.
Returns 0 for success and 1 for failure.
strict([0 or 1])
The [] indicate an optional parameter.
Get or set the value which determines whether or not to use 0 or 1 as the minimum string length.
FAQ
See "FAQ" in Genealogy::Gedcom.
Repository
https://github.com/ronsavage/Genealogy-Gedcom
See Also
Machine-Readable Change Log
The file Changes was converted into Changelog.ini by Module::Metadata::Changes.
Version Numbers
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
References
- o The original Perl Gedcom
- o GEDCOM
-
- o http://www.tamurajones.net/FTWTEXT.xhtml
-
This is apparently the worst offender he's seen. Search that page for 'tags'.
- o http://www.tamurajones.net/GenoPro2011.xhtml
- o http://www.tamurajones.net/GenoPro2007.xhtml
- o http://www.tamurajones.net/TheFTWTEXTProblem.xhtml
- o Other articles on Tamura's site
- o Other projects
-
Many of these are discussed on Tamura's site.
- o http://bettergedcom.wikispaces.com/
- o http://www.ngsgenealogy.org/cs/GenTech_Projects
- o http://gdmxml.fugal.net/
- o http://www.cosoft.org/genxml/
- o http://www.sunflower.com/~billk/GEDC/
- o http://ancestorsnow.blogspot.com/2011/07/vged.html
- o http://www.tamurajones.net/GEDCOMValidation.xhtml
- o http://webtrees.net/
- o http://swoodbridge.com/Genealogy/lifelines/
- o http://deadendssoftware.blogspot.com/
- o http://www.legacyfamilytree.com/
- o https://devnet.familysearch.org/docs/api-overview
The Gedcom Mailing List
Contact perl-gedcom-help@perl.org.
Support
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom.
Author
Genealogy::Gedcom::Reader::Lexer was written by Ron Savage <ron@savage.net.au> in 2011.
Home page: http://savage.net.au/index.html.
Copyright
Australian copyright (c) 2011, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Perl License, a copy of which is available at:
http://dev.perl.org/licenses/