NAME
Lingua::DxExtractor - Extract the presence or absence of a clinical condition from radiology reports.
SYNOPSIS
use Lingua::DxExtractor;
$extractor = Lingua::DxExtractor->new( {
target_words => [ qw( embolus embolism emboli defect pe clot clots ) ],
skip_words => [ qw( history indication technique nondiagnostic ) ],
} );
$text = 'Indication: To rule out pulmonary embolism.\nFindings: There is no evidence of vascular filling defect...\n";
$final_answer = $extractor->process_text( $text ); # 'absent' or 'present'
$is_final_answer_ambiguous = $extractor->ambiguous; # 1 or 0
$debug = $extractor->debug;
$original_text = $extractor->orig_text;
$final_answer = $extractor->final_answer;
$ambiguous = $extractor->ambiguous;
$extractor->reset; # clears orig_text, final_answer, target_sentence and ambiguous
DESCRIPTION
A tool to be used to look for the presence or absence of a clinical condition as reported in radiology reports. The extractor reports a 'final answer', 'absent' or 'present', as well as reports whether this answer is 'ambiguous' or not.
The 'use case' for this is when performing a research project with a large number of records and you need to identify a subset based on a diagnostic entity, you can use this tool to reduce the number of charts that have to be manually examined. In this 'use case' I wanted to keep the sensitivity as high as possible in order to not miss real cases.
The radiographic reports don't require textual preprocessing however clearly the selection of target_words and skip_words requires reading through reports to get a sense of what vocabulary is being used in the particular dataset that is being evaluated.
Negated terms are identified using Lingua::NegEx which is a perl implementation of Wendy Chapman's NegEx algorithm.
SUBROUTINES/METHODS
new( {
target_words => \@target_words,
skip_words => \@skip_words,
} );
target_words( \@words );
This is a list of words that describe the clinical entity in question. All forms of the entity in question need to explicitly stated since the package is currently not using lemmatization or stemming.
skip_words( \@skip );
Not required. This is a list of words that can be used to eliminate sentences in the text that might confuse the extractor. For example most radiographic reports start with a brief description of the indication for the test. This statement may state the clinical entity in question but does not mean it is present in the study (ie. Indication: to rule out pulmonary embolism).
EXPORT
None by default.
DEPENDENCIES
Lingua::NegEx
Text::Sentence
Class::MakeMethods
head=1 SEE ALSO
http://www.ncbi.nlm.nih.gov/pubmed/21459155 for a similar project using ConText.
http://www.iturrate.com/DxExtractor.html - a simple web interface to the algorithm.
TO DO
1. Add lemmatization or stemming to target_words so you don't have to explicitly write out all forms of words.
2. Add ConText support.
AUTHOR
Eduardo Iturrate, <ed@iturrate.com>
COPYRIGHT AND LICENSE
Copyright (C) 2013 by Eduardo Iturrate
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.