NAME
BusyBird::Util - utility functions for BusyBird
SYNOPSIS
use BusyBird::Util qw(sort_statuses split_with_entities future_of);
future_of($timeline, "get_statuses", count => 100)->then(sub {
my ($statuses) = @_;
my $sorted_statuses = sort_statuses($statuses);
my $status = $sorted_statuses->[0];
my $segments_arrayref = split_with_entities($status->{text}, $status->{entities});
return $segments_arrayref;
})->catch(sub {
my ($error, $is_normal_error) = @_;
warn $error;
});
DESCRIPTION
This module provides some utility functions useful in BusyBird.
EXPORTABLE FUNCTIONS
The following functions are exported only by request.
$sorted = sort_statuses($statuses)
Sorts an array of status objects appropriately. Argument $statuses
is an array-ref of statuses.
Return value $sorted
is an array-ref of sorted statuses.
The sort refers to $status->{created_at}
and $status->{busybird}{acked_at}
fields. See "Order_of_Statuses" in BusyBird::StatusStorage section.
$segments_arrayref = split_with_entities($text, $entities_hashref)
Splits the given $text
with the "entities" and returns the split segments.
$text
is a string to be split. $entities_hashref
is a hash-ref which has the same stucture as Twitter Entities. Each entity object annotates a part of $text
with such information as linked URLs, mentioned users, mentioned hashtags, etc. If $entities_hashref
doesn't conform to the said structure, it is ignored.
The return value $segments_arrayref
is an array-ref of "segment" objects. A "segment" is a hash-ref containing a part of $text
and the entity object (if any) attached to it. Note that $segments_arrayref
has segments that no entity is attached to. $segments_arrayref
is sorted, so you can assemble the complete $text
by concatenating all the segments.
Example:
my $text = 'aaa --- bb ---- ccaa -- ccccc';
my $entities = {
a => [
{indices => [0, 3], url => 'http://hoge.com/a/1'},
{indices => [18, 20], url => 'http://hoge.com/a/2'},
],
b => [
{indices => [8, 10], style => "bold"},
],
c => [
{indices => [16, 18], footnote => 'first c'},
{indices => [24, 29], some => {complex => 'structure'}},
],
d => []
};
my $segments = split_with_entities($text, $entities);
## $segments = [
## { text => 'aaa', start => 0, end => 3, type => 'a',
## entity => {indices => [0, 3], url => 'http://hoge.com/a/1'} },
## { text => ' --- ', start => 3, end => 8, type => undef,
## entity => undef},
## { text => 'bb', start => 8, end => 10, type => 'b',
## entity => {indices => [8, 10], style => "bold"} },
## { text => ' ---- ', start => 10, end => 16, type => undef,
## entity => undef },
## { text => 'cc', start => 16, end => 18, type => 'c',
## entity => {indices => [16, 18], footnote => 'first c'} },
## { text => 'aa', start => 18, end => 20, type => 'a',
## entity => {indices => [18, 20], url => 'http://hoge.com/a/2'} },
## { text => ' -- ', start => 20, end => 24, type => undef,
## entity => undef },
## { text => 'ccccc', start => 24, end => 29, type => 'c',
## entity => {indices => [24, 29], some => {complex => 'structure'}} }
## ];
Any entity object is required to have indices
field, which is an array-ref of starting and ending indices of the text part. The ending index must be greater than or equal to the starting index. If an entitiy object does not meet this condition, that entity object is ignored.
Except for indices
, all fields in entity objects are optional.
Text ranges annotated by entity objects must not overlap. In that case, the result is undefined.
A segment hash-ref has the following fields.
text
-
Substring of the
$text
. start
-
Starting index of the segment in
$text
. end
-
Ending index of the segment in
$text
. type
-
Type of the entity. If the segment has no entity attached, it is
undef
. entity
-
Attached entity object. If the segment has no entity attached, it is
undef
.
It croaks if $text
is undef
.
$future = future_of($invocant, $method, %args)
Wraps a callback-style method call with a Future::Q object.
This function executes $invocant->$method(%args)
, which is supposed to be a callback-style method. Before the execution, callback
field in %args
is overwritten, so that the result of the $method
can be obtained from $future
.
To use future_of()
, the $method
must conform to the following specification. (Most of BusyBird::Timeline's callback-style methods follow this specification)
The
$method
takes named arguments as in$invocant->$method(key1 => value1, key2 => value2 ... )
.When the
$method
's operation is done, the subroutine reference stored in$args{callback}
must be called exactly once.$args{callback}
must be called as in$args{callback}->($error, @results)
In success, the
$error
must be a falsy scalar and the rest of the arguments is the result of the operation. The arguments other than$error
are used to fulfill the$future
.In failure, the
$error
must be a truthy scalar that describes the error. The$error
is used to reject the$future
.
The return value ($future
) is a Future::Q object, which represents the result of the $method
call. If $method
throws an exception, it is caught by future_of()
and $future
becomes rejected.
In success, $future
is fulfilled with the results the $method
returns.
$future->then(sub {
my @results = @_;
...
});
In failure, $future
is rejected with the error and a flag.
$future->catch(sub {
my ($error, $is_normal_error) = @_;
...
});
If $error
is the error passed to the callback, $is_normal_error
is true. If $error
is the exception the method throws, $is_normal_error
does not even exist.
$tracking_timeline = make_tracking($tracking_timeline, $main_timeline)
Makes $tracking_timeline
a tracking timeline for a certain source of statuses, which is then input to $main_timeline
. $tracking_timeline
and $main_timeline
must be BusyBird::Timeline objects.
Return value is the given $tracking_timeline
object.
This method uses BusyBird::Log to log error messages when something goes wrong.
A "tracking timeline" is a timeline dedicated to tracking status history of a single source. You might need it when you import statuses from various sources into a single "main" timeline.
For example,
use BusyBird;
use BusyBird::Input::Feed;
my $input = BusyBird::Input::Feed->new();
my $main_timeline = timeline("main");
$main_timeline->add( $input->parse_url('http://example1.com/feed.rss') );
$main_timeline->add( $input->parse_url('http://example2.com/feed.rss') );
$main_timeline->add( $input->parse_url('http://example3.com/feed.rss') );
In the above example, statuses are imported from three different RSS feeds using BusyBird::Input::Feed. Because BusyBird::Timeline rejects duplicate statuses, the above code adds only new and unread statuses to $main_timeline
.
However, if update rates of the three feeds are different, it's possible for old statuses to re-appear in $main_timeline
as new statuses. This is because BusyBird::Timeline has limited capacity for storing statuses.
Suppose the example1 and example2 update quickly whereas example3's update rate is very slow. At first, $main_timeline
keeps all statuses from the three feeds. After a while, the $main_timeline
will be filled with statuses from example1 and example2, and at a certain point, statuses from example3 will be discarded because they are too old. After that, $main_timeline->add( $input->parse_url('http://example3.com/feed.rss') )
imports the same statuses just discarded, but $main_timeline
now recognizes them as new because they are no longer in $main_timeline
. So those old statuses from example3 will re-appear as unread.
To prevent that tragedy, you should create tracking timelines.
use BusyBird;
use BusyBird::Input::Feed;
use BusyBird::Util qw(make_tracking);
my $input = BusyBird::Input::Feed->new();
my $main_timeline = timeline("main");
make_tracking(timeline("example1"), $main_timeline);
make_tracking(timeline("example2"), $main_timeline);
make_tracking(timeline("example3"), $main_timeline);
timeline("example1")->add( $input->parse_url('http://example1.com/feed.rss') );
timeline("example2")->add( $input->parse_url('http://example2.com/feed.rss') );
timeline("example3")->add( $input->parse_url('http://example3.com/feed.rss') );
You should add statuses into tracking timelines instead of directly into $main_timeline
. Each tracking timeline keeps statuses from its source, and it forwards only new statuses to the $main_timeline
.
AUTHOR
Toshio Ito <toshioito [at] cpan.org>