* v0.3.3

- added timeout parameter for requests to the ApacheTika server
- fallback to local java app if the ApacheTika server request fails or is incomplete
- catch parser failure and return undef instead of breaking
- removed Blacklist language identifier and use CLD instead

* v0.3.2

- reset vocabulary and language model for each call
- set locale to utf8 before calling external programs

* v0.3.1

- fixed lowercase option

* v0.3.0 Sat Jan  5 14:02:06 EET 2019

- moved options and functions to library
- enable different output options
- enabled calls to Apache Tika server if available

* v0.2.8 Fri Aug 17 10:04:58 EEST 2018

- update to Apache Tika 1.18
- hide warnings and error messages from external tools

* v0.2.7 Sat Jul  1 21:25:09 EEST 2017

- added an ugly workaround to find java 1.6 for pdfxtk

* v0.2.6 Thu Jan  9 16:10:29 CET 2014

- integrated language detection (-d)
- language filter using language detection (-D lang)

* v0.2.5

- merge paragraph heuristics for putting unfinished sentences together
- better approach for finding word boundaries based on
  a unigram LM and dynamic programming
- better de-hyphenation in pdfxtk-mode

* v0.2.4 Fri Mar 15 10:34:42 CET 2013

- pdfxtk as default
- heuristics to handle ligatures
- dehyphenation and other heurstics in pdfxtk-mode (-X)
- now also splits strings into characters to find known words
  (solves a problem with pdfxtk conversions)

* v0.2.3 Wed Mar  6 23:12:22 CET 2013

- fixed test suite

* v0.2.2 Wed Feb 27 20:33:09 CET 2013

- fixed problem with wrong shared-dir settings
- make word-merging a bit more efficient

* v0.2.1 Fri Feb 15 16:29:41 CET 2013

- add pdfXtk as another option for converting pdf files
  (see http://sourceforge.net/projects/pdfxtk/)

* v0.2 - Thu Feb  7 10:51:16 CET 2013

- running without pdftotext is now possible
- added lowercasing (can be switched off)

* v0.1 - Tue Jan 29 20:31:42 CET 2013

- initial release