NAME
Genealogy::Gedcom::Date - Parse GEDCOM dates in French r/German/Gregorian/Hebrew/Julian
Synopsis
A script (scripts/synopsis.pl):
#!/usr/bin/env perl
use strict;
use warnings;
use Genealogy::Gedcom::Date;
# --------------------------
sub process
{
my($count, $parser, $date) = @_;
print "$count: $date: ";
my($result) = $parser -> parse(date => $date);
print "Canonical date @{[$_ + 1]}: ", $parser -> canonical_date($$result[$_]), ". \n" for (0 .. $#$result);
print 'Canonical form: ', $parser -> canonical_form($result), ". \n";
print "\n";
} # End of process.
# --------------------------
my($parser) = Genealogy::Gedcom::Date -> new(maxlevel => 'debug');
process(1, $parser, 'Julian 1950');
process(2, $parser, '@#dJulian@ 1951');
process(3, $parser, 'From @#dJulian@ 1952 to Gregorian 1953/54');
process(4, $parser, 'From @#dFrench r@ 1955 to 1956');
process(5, $parser, 'From @#dJulian@ 1957 to German 1.Dez.1958');
One-liners:
perl scripts/parse.pl -max debug -d 'Between Gregorian 1701/02 And Julian 1703'
Output:
Return value from parse():
[
{
canonical => "1701/02",
flag => "BET",
kind => "Date",
suffix => "02",
type => "Gregorian",
year => 1701
},
{
canonical => "\@#dJULIAN\@ 1703",
flag => "AND",
kind => "Date",
type => "Julian",
year => 1703
}
]
perl scripts/parse.pl -max debug -d 'Int 10 Nov 1200 (Approx)'
Output:
[
{
canonical => "10 Nov 1200 (Approx)",
day => 10,
flag => "INT",
kind => "Date",
month => "Nov",
phrase => "(Approx)",
type => "Gregorian",
year => 1200
}
]
perl scripts/parse.pl -max debug -d '(Unknown)'
Output:
Return value from parse():
[
{
canonical => "(Unknown)",
kind => "Phrase",
phrase => "(Unknown)",
type => "Phrase"
}
]
See the "FAQ" for the explanation of the output arrayrefs.
See also scripts/parse.pl and scripts/compare.pl for sample code.
Lastly, you are strongly encouraged to peruse t/*.t.
Description
Genealogy::Gedcom::Date provides a Marpa-based parser for GEDCOM dates.
Calender escapes supported are (case-insensitive): French r/German/Gregorian/Hebrew/Julian.
Gregorian is the default, and does not need to be used at all.
Comparison of 2 Genealogy::Gedcom::Date
-based objects is supported by calling the sub "compare($other_object)" method on one object and passing the other object as the parameter.
Note: compare()
can return any one of four (4) values.
See the GEDCOM Specification, p 45.
Installation
Install Genealogy::Gedcom::Date as you would for any Perl
module:
Run:
cpanm Genealogy::Gedcom::Date
or run:
sudo cpan Genealogy::Gedcom::Date
or unpack the distro, and then either:
perl Build.PL
./Build
./Build test
sudo ./Build install
or:
perl Makefile.PL
make (or dmake or nmake)
make test
make install
Constructor and Initialization
new()
is called as my($parser) = Genealogy::Gedcom::Date -> new(k1 => v1, k2 => v2, ...)
.
It returns a new object of type Genealogy::Gedcom::Date
.
Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. "date([$date])"]):
- o canonical => $integer
-
Note: Nothing is printed unless
maxlevel
is set todebug
.- o canonical => 0
-
Data::Dumper::Concise's Dumper() prints the output of the parse.
- o canonical => 1
-
canonical_form() is called on the output of parse() to print a string.
- o canonical => 2
-
canonocal_date() is called on each element in the result from parse(), to print strings on separate lines.
Default: 0.
- o date => $date
-
The string to be parsed.
Each ',' is replaced by a space. See the "FAQ" for details.
Default: ''.
- o logger => $aLoggerObject
-
Specify a logger compatible with Log::Handler, for the lexer and parser to use.
Default: A logger of type Log::Handler which writes to the screen.
To disable logging, just set 'logger' to the empty string (not undef).
- o maxlevel => $logOption1
-
This option affects Log::Handler.
See the Log::Handler::Levels docs.
By default nothing is printed.
Typical values are: 'error', 'notice', 'info' and 'debug'.
The default produces no output.
Default: 'notice'.
- o minlevel => $logOption2
-
This option affects Log::Handler.
See the Log::Handler::Levels docs.
Default: 'error'.
No lower levels are used.
Note: The parameters canonical
and date
can also be passed to "parse([%args])".
Methods
canonical([$integer])
Here, the [] indicate an optional parameter.
Gets or sets the canonical
option, which controls what exactly "parse([%args])" prints when "maxlevel([$string])" is set to debug
.
By default nothing is printed.
See "canonical_date($hashref)", next, for sample code.
canonical_date($hashref)
$hashref is either element of the arrayref returned by "parse([%args])". The hashref may be empty.
Returns a date string (or the empty string) normalized in various ways:
- o If Gregorian (in any form) was in the original string, it is discarded
-
This is done because it's the default.
- o If any other calendar escape was in the original string, it is preserved
-
And it's output in all caps.
And as a special case, 'FRENCHR' is returned as 'FRENCH R'.
- o If About, etc were in the orginal string, they are discarded
-
This means the
flag
key in the hashref is ignored.
Note: This method is called by "parse([%args])" to populate the canonical
key in the arrayref of hashrefs returned by parse()
.
Try:
perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015'
perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 0
perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 1
perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 2
canonical_form($arrayref)
Returns a date string containing zero, one or two dates.
This method calls "canonical_date($hashref)" for each element in the $arrayref. The arrayref may be empty.
Then it adds information from the flag
key in each element, if present.
For sample code, see "canonical_date($hashref)" just above.
compare($other_object)
Returns an integer 0 .. 3 (sic) indicating the temporal relationship between the invoking object ($self) and $other_object.
Returns one of these values:
0 if the dates have different date escapes.
1 if $date_1 < $date_2.
2 if $date_1 = $date_2.
3 if $date_1 > $date_2.
Note: Gregorian years like 1510/02 are converted into 1510 before the dates are compared. Create a sub-class and override "normalize_date($date_hash)" if desired.
See scripts/compare.pl for sample code.
See also "normalize_date($date_hash)".
date([$date])
Here, [ and ] indicate an optional parameter.
Gets or sets the date to be parsed.
The date in parse(date => $date)
takes precedence over both new(date => $date)
and date($date)
.
This means if you call parse()
as parse(date => $date)
, then the value $date
is stored so that if you subsequently call date()
, that value is returned.
Note: date
is a parameter to new().
error()
Gets the last error message.
Returns '' (the empty string) if there have been no errors.
If Marpa::R2 throws an exception, it is caught by a try/catch block, and the Marpa
error is returned by this method.
See "parse([%args])" for more about error()
.
log($level, $s)
If a logger is defined, this logs the message $s at level $level.
logger([$logger_object])
Here, the [] indicate an optional parameter.
Get or set the logger object.
To disable logging, just set 'logger' to the empty string (not undef), in the call to "new()".
This logger is passed to other modules.
'logger' is a parameter to "new()". See "Constructor and Initialization" for details.
maxlevel([$string])
Here, the [] indicate an optional parameter.
Get or set the value used by the logger object.
This option is only used if an object of type Log::Handler is ceated. See Log::Handler::Levels.
Typical values are: 'notice', 'info' and 'debug'. The default, 'notice', produces no output.
The code emits a message with log level 'error' if Marpa throws an exception, and it displays the result of the parse at level 'debug' if maxlevel is set that high. The latter display uses Data::Dumper::Concise's function Dumper()
.
'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
minlevel([$string])
Here, the [] indicate an optional parameter.
Get or set the value used by the logger object.
This option is only used if an object of type Log::Handler is created. See Log::Handler::Levels.
'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
new([%args])
The constructor. See "Constructor and Initialization".
normalize_date($date_hash)
Normalizes $date_hash for each date during a call to "compare($other_object)".
Override in a sub-class if you wish to change the normalization technique.
parse([%args])
Here, [ and ] indicate an optional parameter.
parse()
returns an arrayref. See the "FAQ" for details.
If the arrayref is empty, call "error()" to retrieve the error message.
In particular, the arrayref will be empty if the input date is the empty string.
parse()
takes the same parameters as new()
.
Warning: The array can contain 1 element when 2 are expected. This can happen if your input contains 'From ... To ...' or 'Between ... And ...', and one of the dates is invalid. That is, the return value from parse()
will contain the valid date but no indicator of the invalid one.
Extensions to the Gedcom specification
This chapter lists exactly how this code differs from the Gedcom spec.
- o Input may be in Unicode
- o Input may be in any case
- o Input may omit calendar escapes when the date is unambigous
- o Any of the following tokens may be used
-
- o abt, about, circa
- o aft, after
- o and
- o bc, b.c., bce
- o bef, before
- o bet, between
- o cal, calculated
- o french r, frenchr, german, gregorian, hebrew, julian,
- o est, estimated
- o from
- o German BCE
-
vc, v.c., v.chr., vchr, vuz, v.u.z.
- o German month names
-
jan, feb, mär, maer, mrz, apr, mai, jun, jul, aug, sep, sept, okt, nov, dez
- o Gregorian month names
-
jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec
- o Hebrew month names
-
tsh, csh, ksl, tvt, shv, adr, ads, nsn, iyr, svn, tmz, aav, ell
- o int, interpreted
- o to
FAQ
What is the format of the value returned by parse()?
It is always an arrayref.
If the date is like '1950' or 'Bef 1950 BCE', there will be 1 element in the arrayref.
If the date contains both 'From' and 'To', or both 'Between' and 'And', then the arrayref will contain 2 elements.
Each element is a hashref, with various combinations of the following keys. You need to check the existence of some keys before processing the date.
This means missing values (day, month, bce) are never fabricated. These keys only appear in the hashref if such a token was found in the input.
Keys:
- o bce
-
If the input contains any (case-insensitive) BCE indicator, under any calendar escape, the
bce
key will hold the exact indicator. - o canonical => $string
-
"parse([%args])" calls "canonical_date($hashref)" to populate this key.
- o day => $integer
-
If the input contains a day, then the
day
key will be present. - o flag => $string
-
If the input contains any of the following (case-insensitive), then the
flag
key will be present:- o Abt or About
- o Aft or After
- o And
- o Bef or Before
- o Bet or Between
- o Cal or Calculated
- o Est or Estimated
- o From
- o Int or Interpreted
- o To
$string will take one of these values (case-sensitive):
- o kind => 'Date' or 'Phrase'
-
The
kind
key is always present, and always takes the value 'Date' or 'Phrase'.If the value is 'Phrase', see the
phrase
andtype
keys.During processing, there can be another - undocumented - element in the arrayref. It represents the calendar escape, and in that case
kind
takes the value 'Calendar'. This element is discarded before the final arrayref is returned to the caller. - o month => $string
-
If the input contains a month, then the
month
key will be present. The case of $string will be exactly whatever was in the input. - o phrase => "($string)"
-
If the input contains a date phrase, then the
phrase
key will be present. The case of $string will be exactly whatever was in the input.parse(date => 'Int 10 Nov 1200 (Approx)') returns:
[ { day => 10, flag => "INT", kind => "Date", month => "Nov", phrase => "(Approx)", type => "Gregorian", year => 1200 } ]
parse(date => '(Unknown)') returns:
[ { kind => "Phrase", phrase => "(Unknown)", type => "Phrase" } ]
See also the
kind
andtype
keys. - o suffix => $two_digits
-
If the year contains a suffix (/00), then the
suffix
key will be present. The '/' is discarded.Obviously, this key can only appear when the year is of the Gregorian form 1700/00.
See also the
year
key below. - o type => $string
-
The
type
key is always present, and takes one of these case-sensitive values: - o year => $integer
-
If the input contains a year, then the
year
key is present.If the year contains a suffix (/00), see also the
suffix
key, above. This means the value of theyear
key is never "$integer/$two_digits".
When should I use a calendar escape?
- o In theory, for every non-Gregorian date
-
In practice, if the month name is unique to a specific language, then the escape is not needed, since Marpa::R2 and this code automatically handle ambiguity.
Likewise, if you use a Gregorian year in the form 1700/01, then the calendar escape is obvious.
The escape is, of course, always inserted into the values returned by the
canonical
pair of methods when they process non-Gregorian dates. That makes their output compatible with other software. And no matter what case you use specifying the calendar escape, it is always output in upper-case. - o When you wish to force the code to provide an unambiguous result
-
All Gregorian and Julian dates are ambiguous, unless they use the year format 1700/01.
So, to resolve the ambiguity, add the calendar escape.
Why is '@' escaped with '\' when Data::Dumper::Concise's Dumper()
prints things?
That's just how that module handles '@'.
Does this module accept Unicode?
Yes.
See t/German.t for sample code.
Can I change the default calendar?
No. It is always Gregorian.
Are dates massaged before being processed?
Yes. Commas are replaced by spaces.
French month names
See "Extensions to the Gedcom specification".
German month names
See "Extensions to the Gedcom specification".
Hebrew month names
See "Extensions to the Gedcom specification".
What happens if parse()
is given a string like 'To 2000 From 1999'?
The code does not reorder the dates.
Why was this module renamed from DateTime::Format::Gedcom?
The DateTime suite of modules aren't designed, IMHO, for GEDCOM-like applications. It was a mistake to use that name in the first place.
By releasing under the Genealogy::Gedcom::* namespace, I can be much more targeted in the data types I choose as method return values.
Why did you choose Moo over Moose?
My policy is to use the lightweight Moo for all modules and applications.
Trouble-shooting
Things to consider:
- o Error message: Marpa exited at (line, column) = ($line, $column) within the input string
-
Consider the possibility that the parse ends without a
successful
parse, but the input is the prefix of some input thatcan
lead to a successful parse.Marpa is not reporting a problem during the read(), because you can add more to the input string, and Marpa does not know that you do not plan to do this.
- o You tried to enter the German month name 'Mär' via the shell
-
Read more about this by running 'perl scripts/parse.pl -h', where it discusses '-d'.
- o You mistyped the calendar escape
-
Check: Are any of these valid?
Yes, the last 3 are accepted by this module, and the last one is accepted by other software.
- o The date is in American format (month day year)
- o You used a Julian calendar with a Gregorian year
-
Dates - such as 1900/01 - which do not fit the Gedcom definition of a Julian year, are filtered out.
See Also
Time::Piece is in Perl core. See http://perltricks.com/article/59/2014/1/10/Solve-almost-any-datetime-need-with-Time-Piece
Time::Duration is more sophisticated than Time::Elapsed
Time::Moment implements ISO 8601
http://blogs.perl.org/users/buddy_burden/2015/09/a-date-with-cpan-part-1-state-of-the-union.html
http://blogs.perl.org/users/buddy_burden/2015/10/-a-date-with-cpan-part-3-paving-while-driving.html
Machine-Readable Change Log
The file Changes was converted into Changelog.ini by Module::Metadata::Changes.
Version Numbers
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
Repository
https://github.com/ronsavage/Genealogy-Gedcom-Date.
Support
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom::Date.
Credits
Thanx to Eugene van der Pijll, the author of the Gedcom::Date::* modules.
Thanx also to the authors of the DateTime::* family of modules. See http://datetime.perl.org/wiki/datetime/dashboard for details.
Thanx for Mike Elston on the perl-gedcom mailing list for providing French month abbreviations, amongst other information pertaining to the French language.
Thanx to Michael Ionescu on the perl-gedcom mailing list for providing the grammar for German dates and German month abbreviations.
Author
Genealogy::Gedcom::Date was written by Ron Savage <ron@savage.net.au> in 2011.
Homepage: http://savage.net.au/index.html.
Copyright
Australian copyright (c) 2011, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Perl License, a copy of which is available at:
http://dev.perl.org/licenses/