NAME
App::JobLog::TimeGrammar - parse natural (English) language time expressions
VERSION
version 1.002
SYNOPSIS
#!/usr/bin/perl
use Modern::Perl;
use DateTime;
use App::JobLog::Time qw(tz);
use App::JobLog::TimeGrammar qw(parse);
# for demonstration purposes we modify "today"
$App::JobLog::Time::today =
DateTime->new( year => 2011, month => 2, day => 17, time_zone => tz );
for my $phrase ( 'Monday until the end of the week', 'Tuesday at 9:00 p.m.' ) {
my ( $start, $end, $endpoints ) = parse($phrase);
say $phrase;
say "$start - $end; both endpoints specified? "
. ( $endpoints ? 'yes' : 'no' );
}
produces
Monday until the end of the week
2011-02-14T00:00:00 - 2011-02-20T23:59:59; both endpoints specified? yes
Tuesday at 9:00 p.m.
2011-02-08T21:00:00 - 2011-02-15T23:59:59; both endpoints specified? no
DESCRIPTION
App::JobLog::TimeGrammar
converts natural language time expressions into pairs of DateTime
objects representing intervals. This requires disambiguating ambiguous terms such as 'yesterday', whose interpretation varies from day to day, and 'Friday', whose interpretation must be fixed by some frame of reference. The heuristic used by this code is to look first for a fixed date, either a fully specified date such as 2011/2/17 or one fixed relative to the current moment such as 'now'. If such a date is present in the time expression it determines the context for the other date, if it is present. Otherwise it is assumed that the closest appropriate pair of dates immediately before the current moment are intended.
Given a pair consisting of fixed and an ambiguous date, we assume the ambiguous date has the sense such that it is ordered correctly relative to the fixed date and the interval between them is minimized.
If the time expression provides no time of day, such as 8:00, it is assumed that the first moment intended is the first second of the first day and the last moment is the last second of the second day. If no second date is provided the endpoint of the interval will be the last moment of the single date specified. If a larger time period such as week, month, or year is specified, e.g., 'last week', the first moment is the first second in the period and the last moment is the last second.
If you wish to parse a single date, not an interval, you can ignore the second date, though you should check the third value returned by parse
, whether an interval was parsed.
parse
will croak if it cannot parse the expression given.
Time Grammar
The following is a semi-formal BNF grammar of time understood by App::JobLog::TimeGrammar
. In this formalization s
represents whitespace, d
represents a digit, and \\n
represents a back reference to the nth item in parenthesis in the given rule. After the first three rules the rules are alphabetized to facilitate finding them.
<expression> = s* ( <ever> | <span> ) s*
<ever> = "all" | "always" | "ever" | [ [ "the" s ] ( "entire" | "whole" ) s ] "log"
<span> = <date> [ <span_divider> <date> ]
<at> = "at" | "@"
<at_time> = [ ( s | s* <at> s* ) <time> ]
<at_time_on> = [ <at> s ] <time> s "on" s
<beginning> = "beg" [ "in" [ "ning" ] ]
<date> = <numeric> | <verbal>
<day_first> = d{1,2} s <month>
<divider> = "-" | "/" | "."
<dm_full> = d{1,2} s <month> [ "," ] s d{4}
<dom> = d{1,2}
<full> = <at_time_on> <full_no_time> | <full_no_time> <at_time>
<full_month> = "january" | "february" | "march" | "april" | "may" | "june" | "july" | "august" | "september" | "october" | "november" | "december"
<full_no_time> = <dm_full> | <md_full>
<full_weekday> = "sunday" | "monday" | "tuesday" | "wednesday" | "thursday" | "friday" | "saturday"
<iso> = d{4} ( <divider> ) d{1,2} \1 d{1,2}
<md> = d{1,2} <divider> d{1,2}
<md_full> = <month> s d{1,2} "," s d{4}
<modifiable_day> = <at_time_on> <modifiable_day_no_time> | <modifiable_day_no_time> <at_time>
<modifiable_day_no_time> = [ <modifier> s ] <weekday>
<modifiable_month> = [ <month_modifier> s ] <month>
<modifiable_period> = [ <period_modifier> s ] <period>
<modifier> = "last" | "this"
<month> = <full_month> | <short_month>
<month_day> = <at_time_on> <month_day_no_time> | <month_day_no_time> <at_time>
<month_day_no_time> = <month_first> | <day_first>
<month_first> = <month> s d{1,2}
<month_modifier> = <modifier> | <termini> [ s "of" ]
<named_period> = <modifiable_day> | <modifiable_month> | <modifiable_period>
<now> = "now"
<numeric> = <year> | <at_time_on> <numeric_no_time> | <numeric_no_time> <at_time>
<numeric_no_time> = <us> | <iso> | <md> | <dom>
<pay> = "pay" | "pp" | "pay" s* "period"
<period> = "week" | "month" | "year" | <pay>
<period_modifier> = <modifier> | <termini> [ s "of" [ s "the" ] ]
<relative_period> = [ <at> s* ] <time> s <relative_period_no_time> | <relative_period_no_time> <at_time> | <now>
<relative_period_no_time> = "yesterday" | "today"
<short_month> = "jan" | "feb" | "mar" | "apr" | "may" | "jun" | "jul" | "aug" | "sep" | "oct" | "nov" | "dec"
<short_weekday> = "sun" | "mon" | "tue" | "wed" | "thu" | "fri" | "sat"
<span_divider> = s* ( "-"+ | ( "through" | "thru" | "to" | "til" [ "l" ] | "until" ) ) s*
<termini> = [ "the" s ] ( <beginning> | "end" )
<time> = d{1,2} [ : d{2} [ : d{2} ] ] [ s* <time_suffix> ]
<time_suffix> = ( "a" | "p" ) ( "m" | ".m." )
<us> = d{1,2} ( <divider> ) d{1,2} \1 d{4}
<verbal> = <named_period> | <relative_period> | <month_day> | <full>
<weekday> = <full_weekday> | <short_weekday>
<year> = d{4}
In general App::JobLog::TimeGrammar
will understand most time expressions you are likely to want to use.
METHODS
daytime
Parses a time expression such as "11:00" or "8:15:40 pm". Returns a map from hour
, minute
, second
, and suffix
to the appropriate value, where 'x' represents an ambiguous suffix.
parse
This function (it isn't actually a method) is the essential function of this module. It takes a time expression and returns a pair of DateTime
objects representing the endpoints of the corresponding interval and whether it was given a pair of dates.
If you are parsing an expression defining a point rather than an interval you should be safe ignoring the second endpoing, but you should check the count to make sure the expression didn't provide a second endpoint.
This code croaks when it cannot parse the expression, so exception handling is recommended.
AUTHOR
David F. Houghton <dfhoughton@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2011 by David F. Houghton.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.