NAME

Lingua::DxExtractor - Perl extension to perform NER and quick and dirty checking for negation relying on StanfordCoreNLP.

SYNOPSIS

use Lingua::DxExtractor;

my $extractor = Lingua::DxExtractor->new( {
  words => [  qw( embolus embolism pe clot ) ],
  skip_words => [ qw( history indication technique nondiagnostic ) ],
} );

my $counter ;
$extractor->process_text( $text );
$extractor->examine_text;

$debug =  $extractor->finalize_results;
$absent_or_present = $extractor->final_answer;
$is_final_answer_ambiguous = $extractor->ambiguous;

DESCRIPTION

A quick and dirty Named Entity Recognition tool to be used to find diagnostic entities within clinical text. It also includes a simple attempt at finding negated terms. The extractor gives a 'final answer', 'absent' or 'present'. Also the extractor reports if it isn't sure and the answer is ambiguous.

The 'use case' for this is when performing a research project with a large number of records and you need to identify a subset based on a diagnostic entity, you can use this tools to reduce the number of charts that have to be manually examined. In this 'use case' I wanted to keep the sensitivity as high as possible in order to not miss real cases.

The extractor uses StanfordCoreNLP's lemmatization, POS tagging, and creation of dependencies. For a given text, sentences are looked at one by one. If one of the 'skip_words' is found then the sentence is skipped. If one of the target 'words' is found then the sentence is flagged for further examination, and the word is marked as 'present'. Each target sentence is examined for negation terms, and if so the word is marked as 'absent'. A 'final answer' for the presence of absence of the condition defined by the target 'words' is then evaluated by looking at all of the accumulated answers for all of the sentences. If there is conflict in the answer then the 'ambiguous' answer flag is marked. For these ambiguous cases, the final answer is whichever answer (absent or present) was most frequently found. In the case of a tie, the default answer is 'present' (this increases false positives but decreases false negatives -- improved sensitivity).

EXPORT

None by default.

SEE ALSO

This module depends on:

Lingua::StanfordCoreNLP which in turn depends on Inline::Java

Class::MakeMethods

AUTHOR

Ed Iturrate, <ed@iturrate.com>

COPYRIGHT AND LICENSE

Copyright (C) 2013 by Eduardo Iturrate

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.