NAME
Lingua::DxExtractor - Perl extension to perform named entity recognition and some degree of looking for negation in a quick and dirty way relying on StanfordCoreNLP.
SYNOPSIS
use Lingua::DxExtractor;
my $extractor = Lingua::DxExtractor->new( {
words => [ qw( embolus embolism pe clot thromboembolism defect ) ],
skip_words => [ qw( evaluate evaluation history indication technique assessment nondiagnostic uninterpretable ) ],
} );
my $counter ;
$extractor->process_text( $text );
$extractor->examine_text;
$debug = $extractor->finalize_results;
$absent_or_present = $extractor->final_answer;
$is_final_answer_ambiguous = $extractor->ambiguous;
DESCRIPTION
A quick and dirty NER tool to be used to find diagnostic entities within clinical text. It also includes a simple attempt at finding negated terms. The extractor gives a 'final answer', 'absent' or 'present'. Also the extractor reports if it isn't sure and the answer is ambiguous.
The 'use case' for this is when performing a research project with a large number of records and you need to identify a subset based on a diagnostic entity, you can use this tools to reduce the number of charts that have to be manually examined. In this 'use case' I wanted to keep the sensitivity as high as possible in order to not miss real cases.
EXPORT
None by default.
SEE ALSO
This module depends on:
Lingua::StanfordCoreNLP which in turn depends on Inline::Java
AUTHOR
Iturrate, <ed@iturrate.com<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2013 by Eduardo Iturrate
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.