NAME

Lingua::DxExtractor - Perl extension to perform named entity recognition and some degree of looking for negation in a quick and dirty way relying on StanfordCoreNLP.

SYNOPSIS

use Lingua::DxExtractor;

my $extractor = Lingua::DxExtractor->new( {
  words => [  qw( embolus embolism pe clot thromboembolism defect ) ],
  skip_words => [ qw( evaluate evaluation history indication technique assessment nondiagnostic uninterpretable ) ],
} );

my $counter ;
$extractor->process_text( $text );
$extractor->examine_text;

$debug =  $extractor->finalize_results;
$absent_or_present = $extractor->final_answer;
$is_final_answer_ambiguous = $extractor->ambiguous;

DESCRIPTION

A quick and dirty NER tool to be used to find diagnostic entities within clinical text. It also includes a simple attempt at finding negated terms. The extractor gives a 'final answer', 'absent' or 'present'. Also the extractor reports if it isn't sure and the answer is ambiguous.

The 'use case' for this is when performing a research project with a large number of records and you need to identify a subset based on a diagnostic entity, you can use this tools to reduce the number of charts that have to be manually examined. In this 'use case' I wanted to keep the sensitivity as high as possible in order to not miss real cases.

EXPORT

None by default.

SEE ALSO

This module depends on:

Lingua::StanfordCoreNLP which in turn depends on Inline::Java

AUTHOR

Iturrate, <ed@iturrate.com<gt>

COPYRIGHT AND LICENSE

Copyright (C) 2013 by Eduardo Iturrate

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.