Changes for version 0.06
- case- and mark-sensitivity control introduced: Following what is used in the Perl module "Text::Levenshtein". By a "matching level" set in new() and via "set_eq()" method. The default remains to search case (and mark) insensitively.
- pos_all method added
- pos_ methods have added argument 'conform' to transliterate the file-given strings into common code.
- set_lang() and get_lang() methods introduced to query the datafile being used and to change it.
- frq_count, frq_opm, cd_count and cd_pct now return 0 rather than empty-string if the looked-up string was not found in the language file.
- all_strings now culls duplicates with uniq() after firstly ensuring have a non-empty string; given some empty lines and duplicate strings for different POS in some files; and then alphabetically sorts them.
- method "list_strings" renamed "select_strings" to avoid confusion with "all_strings", which also returns a list of strings.
- select_strings checks that value is defined for a language file, and that if retrieved it is numeric ahead of checking its range--in case the field is empty for a file (as seems to happen for cd_count with UK file).
- cv_pattern regex check in select_strings transliterates tested strings to ASCII to capture, say, 'é' in the string with just 'e' in the pattern.
- frq_sum method added and POD for related methods indicate that they all can be used to obtain descriptives of frq_count as well.
- POD documentation for stats methods corrected: "raw" should have been "opm", and there is no argument named "log".
- added a croak if "The requested value is not defined for the SUBTLEX-x corpus" of a particular language x.
- Dependency on File::Slurp removed in place of Path::Tiny.
- croak messages expanded to include statement of the method called.
- NL lang: need to specify _all or _min files; see table in POD
Modules
Retrieve word frequencies and related values and lists from subtitles corpora