Changes for version 1.7 - 2010-05-14
- Tests now work with the data files located at either ../data or data.
- The make test now always generates the data/data.* files--this didn't work on Darwin and MSWin32.
- Added calculate() method, which returns all probabilities. identify () now just calls calculate() and returns the most probable language.
- When neither a trigram nor a bigram is found, use the average alphabet size instead of the individual language's alphabet size, as this penalizes Asian languages.
Documentation
build transition matrix for Lingua::Ident module
Modules
Statistical language identification