Changes for version 0.014 - 2022-07-08
- isWORDCHAR_utf8_safe() / toLOWER_utf8_safe() are actually available since Perl v5.26 (Stanislaw Pusep)
- eg/benchmark.pl improvements (Stanislaw Pusep)
Documentation
compute cosine similarity between two documents
uses MinHash & SpeedyFx to compare large text data
efficiently count unique tokens from a file
Modules
tokenize/hash large amount of strings efficiently