Changes for version 0.04 - 2011-11-28
- Modified extensive parts of the embedded documentation.
- Added classes L::D::Variety, L::D::SamplingScheme, and L::D::VOCD, along with corresponding test files.
- Lingua::Diversity (major refactoring):
- Methods measure() and measure_per_category() are not abstract anymore: they perform the array validation and unit recoding stuff, and pass the results on to new abstract private method _measure(). This private method is required to return a L::D::Result object, which is directly forwarded as the return value of public method measure() and measure_per_category(). Note that _measure() has the responsability of handling both the case where it is passed a single array by measure() and the case where it is passed two arrays by measure_per_category().
- Subroutines _validate_size() and _prepend_unit_with_category() have been removed from L::D::Internals and added to this package (L::D). Tests and exception classes have been removed, moved, or renamed accordingly.
- Attributes min_num_items and max_num_items (with private getters and setters) have been added and can be set from within derived classes if necessary.
- This module now uses L::D::Variety, L::D::MTLD, and L::D::VOCD.
- L::D::MTLD:
- Refactored the code to match the modifications of L::D.
- Fixed bug in _measure(), namely the case of a single partial factor with a TTR of 1. Now it counts as 1 factor of length 0 (which is not very satisfying but it is hard to come up with a better alternative).
- L::D::Utils:
- Fixed bug in split_tagged_text() which caused tags to be used in place of lemmas.
- L::D::Internals:
- Added export tag 'all'.
- Added subroutines _sample_indices(), _count_types(), _count_frequency(), _shannon_entropy(), _perplexity(), _renyi_entropy(), and _get_units_per_category() (along with documentation and tests).
- Moved subroutines _validate_size() and _prepend_unit_with_category() to the L::D module (along with documentation and tests).
- Fixed variance precision problem in _get_average().
- Added shortcut in _get_average() for the case where there's only 1 value.
Modules
measuring the diversity of text units
utility subroutines for classes derived from Lingua::Diversity
'MTLD' method for measuring diversity of text units
storing the result of a diversity measurement
storing the parameters of a sampling scheme
utility subroutines for users of classes derived from Lingua::Diversity
'VOCD' method for measuring diversity of text units
measuring the variety of text units
exception classes for Lingua::Diversity