NAME

Twitter::Text - Perl implementation of the twitter-text parsing library

SYNOPSIS

use Twitter::Text;

$result = parse_tweet('Hello world こんにちは世界');
print $result->{valid} ? 'valid tweet' : 'invalid tweet';

DESCRIPTION

Twitter::Text is a Perl implementation of the twitter-text parsing library.

WARNING

This library does not implement auto-linking and hit highlighting.

Please refer Implementation status for latest status.

FUNCTIONS

All functions below are exported by default.

Extraction

extract_hashtags

$hashtags = extract_hashtags($text);

Returns an array reference of extracted hashtag string from $text.

extract_hashtags_with_indices

$hashtags_with_indices = extract_hashtags_with_indices($text, [\%options]);

Returns an array reference of hash reference of extracted hashtag from $text.

Each hash reference consists of hashtag (hashtag string) and indices (range of hashtag).

extract_mentioned_screen_names

$screen_names = extract_mentioned_screen_names($text);

Returns an array reference of exctacted screen name string from $text.

extract_mentioned_screen_names_with_indices

$screen_names_with_indices = extract_mentioned_screen_names_with_indices($text);

Returns an array reference of hash reference of extracted screen name or list from $text.

Each hash reference consists of screen_name (screen name string) and indices (range of screen name).

extract_mentions_or_lists_with_indices

$mentions_or_lists_with_indices = extract_mentions_or_lists_with_indices($text);

Returns an array reference of hash reference of extracted screen name from $text.

Each hash reference consists of screen_name (screen name string) and indices (range of screen name or list). If it is a list, the hash reference also contains list_slug item.

extract_urls

$urls = extract_urls($text);

Returns an array reference of extracted URL string from $text.

extract_urls_with_indices

$urls = extract_urls_with_indices($text, [\%options]);

Returns an array reference of hash reference of extracted URL from $text.

Each hash reference consists of url (URL string) and indices (range of screen name).

Validation

parse_tweet

$parse_result = parse_tweet($text, [\%options]);

The parse_tweet function takes a $text string and optional \%options parameter and returns a hash reference with following values:

weighted_length

The overall length of the tweet with code points weighted per the ranges defined in the configuration file.

permillage

Indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.

valid

Indicates if input text length corresponds to a valid result.

display_range_start, display_range_end

An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.

valid_range_start, valid_range_end

An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet.

EXAMPLES

use Data::Dumper;
use Twitter::Text;

$result = parse_tweet('Hello world こんにちは世界');
print Dumper($result);
# $VAR1 = {
#       'weighted_length' => 33
#       'permillage' => 117,
#       'valid' => 1,
#       'display_range_start' => 0,
#       'display_range_end' => 32,
#       'valid_range_start' => 0,
#       'valid_range_end' => 32,
#     };

is_valid_hashtag

$valid = is_valid_hashtag($hashtag);

Validate $hashtag is a valid hashtag and returns a boolean value that indicates if given argument is valid.

is_valid_list

$valid = is_valid_list($username_list);

Validate $username_list is a valid @username/list and returns a boolean value that indicates if given argument corresponds to a valid result.

is_valid_url

$valid = is_valid_url($url, [unicode_domains => 1, require_protocol => 1]);

Validate $url is a valid URL and returns a boolean value that indicates if given argument is valid.

If unicode_domains argument is a truthy value, validate $url is a valid URL with Unicode characters. (default: true)

If require_protocol argument is a truthy value, validation requires a protocol of URL. (default: true)

is_valid_username

$valid = is_valid_username($username);

Validate $username is a valid username for Twitter and returns a boolean value that indicates if given argument is valid.

SEE ALSO

twitter-text. Implementation of Twitter::Text (this library) is heavily based on Ruby implementation of twitter-text.

https://developer.twitter.com/en/docs/counting-characters

COPYRIGHT & LICENSE

Copyright (C) Twitter, Inc and other contributors

Copyright (C) utgwkk.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

utgwkk <utagawakiki@gmail.com>