The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Twitter::Text - Perl implementation of the twitter-text parsing library

SYNOPSIS

    use Twitter::Text;

    $result = parse_tweet('Hello world こんにちは世界');
    print $result->{valid} ? 'valid tweet' : 'invalid tweet';

DESCRIPTION

Twitter::Text is a Perl implementation of the twitter-text parsing library.

WARNING

This library does not implement auto-linking and hit highlighting.

Please refer Implementation progress for latest status.

FUNCTIONS

Extraction

extract_hashtags

    my \@hashtags = extract_hashtags($text);

extract_hashtags_with_indices

    my \@hashtags_with_indices = extract_hashtags_with_indices($text, [\%options]);

extract_mentioned_screen_names

    my \@screen_names = extract_mentioned_screen_names($text);

extract_mentioned_screen_names_with_indices

    my \@screen_names_with_indices = extract_mentioned_screen_names_with_indices($text);

extract_mentions_or_lists_with_indices

    my \@mentions_or_lists_with_indices = extract_mentions_or_lists_with_indices($text);

extract_urls

    my \@urls = extract_urls($text);

extract_urls_with_indices

    my \@urls = extract_urls_with_indices($text, [\%options]);

Validation

parse_tweet

    my \%parse_result = parse_tweet($text, [\%options]);

The parse_tweet function takes a $text string and optional \%options parameter and returns a hash reference with following values:

weighted_length: the overall length of the tweet with code points weighted per the ranges defined in the configuration file.
permillage: indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.
valid: indicates if input text length corresponds to a valid result.
display_range_start, display_range_end: An array reference of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.
vaildRangeStart, valid_range_end: An array reference of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet.

is_valid_hashtag

    my $valid = is_valid_hashtag($hashtag);

is_valid_list

    my $valid = is_valid_list($username_list);

is_valid_url

    my $valid = is_valid_url($url, [unicode_domains => 1, require_protocol => 1]);

is_valid_username

    my $valid = is_valid_username($username);

SEE ALSO

twitter-text. Implementation of Twitter::Text (this library) is heavily based on Ruby implementation of twitter-text.

https://developer.twitter.com/en/docs/counting-characters

COPYRIGHT & LICENSE

Copyright (C) Twitter, Inc and other contributors

Copyright (C) utgwkk.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

utgwkk <utagawakiki@gmail.com>