TY - JOUR
T1 - Patient-level temporal aggregation for text-based asthma status ascertainment
AU - Wu, Stephen T.
AU - Juhn, Young J.
AU - Sohn, Sunghwan
AU - Liu, Hongfang
PY - 2014
Y1 - 2014
N2 - Objective: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. Methods: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. Results: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. Discussion: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. Conclusion: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy.
AB - Objective: To specify the problem of patient-level temporal aggregation from clinical text and introduce several probabilistic methods for addressing that problem. The patient-level perspective differs from the prevailing natural language processing (NLP) practice of evaluating at the term, event, sentence, document, or visit level. Methods: We utilized an existing pediatric asthma cohort with manual annotations. After generating a basic feature set via standard clinical NLP methods, we introduce six methods of aggregating time-distributed features from the document level to the patient level. These aggregation methods are used to classify patients according to their asthma status in two hypothetical settings: retrospective epidemiology and clinical decision support. Results: In both settings, solid patient classification performance was obtained with machine learning algorithms on a number of evidence aggregation methods, with Sum aggregation obtaining the highest F1 score of 85.71% on the retrospective epidemiological setting, and a probability density function-based method obtaining the highest F1 score of 74.63% on the clinical decision support setting. Multiple techniques also estimated the diagnosis date (index date) of asthma with promising accuracy. Discussion: The clinical decision support setting is a more difficult problem. We rule out some aggregation methods rather than determining the best overall aggregation method, since our preliminary data set represented a practical setting in which manually annotated data were limited. Conclusion: Results contrasted the strengths of several aggregation algorithms in different settings. Multiple approaches exhibited good patient classification performance, and also predicted the timing of estimates with reasonable accuracy.
UR - http://www.scopus.com/inward/record.url?scp=84906315639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906315639&partnerID=8YFLogxK
U2 - 10.1136/amiajnl-2013-002463
DO - 10.1136/amiajnl-2013-002463
M3 - Article
C2 - 24833775
AN - SCOPUS:84906315639
SN - 1067-5027
VL - 21
SP - 876
EP - 884
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 5
ER -