Changes for version 0.04
- INCOMPATIBLE CHANGE: refactored files related code (data files now stored as GZip archives rather than plaintext files)
- INCOMPATIBLE CHANGE: tokens_bounds() now returns zero-based index of the boundary instead of the position of the character after
- data files are now represented with classes and proper API
- few small bugfixes
- split tests for tokens() and tokens_bounds(), enable tests for the latter
- data files now have their own version, independent from module's version
- data_dir is now configurable in constructor
- other small fixes and improvments
Documentation
download newer data for tokenizer
Modules
tokenizer for OpenCorpora project
download newer data for tokenizer
Provides
in lib/Lingua/RU/OpenCorpora/Tokenizer/List.pm
in lib/Lingua/RU/OpenCorpora/Tokenizer/Vectors.pm