TY - JOUR
T1 - Valx
T2 - A system for extracting and structuring numeric lab test comparison statements from text
AU - Hao, Tianyong
AU - Liu, Hongfang
AU - Weng, Chunhua
N1 - Funding Information:
This project was supported by National Library of Medicine grant R01LM009886 (PI: Weng), National Institute of General Medical Sciences grant R01GM102282 (PI: Liu), and National Center for Advancing Translational Sciences grant UL1TR000040 (PI: Ginsberg). The first author was later funded by National Natural Science Foundation of China grant NO 61403088 (PI: Hao) to complete the manuscript.
Publisher Copyright:
© Schattauer 2016.
PY - 2016
Y1 - 2016
N2 - Objectives: To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text. Methods: Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes seven steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric,unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable – numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identified from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov. Results: The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 diabetes trials, respectively. The precision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively. Conclusions: Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generalizability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.
AB - Objectives: To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text. Methods: Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes seven steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric,unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable – numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identified from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov. Results: The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 diabetes trials, respectively. The precision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively. Conclusions: Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generalizability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.
KW - Clinical trial
KW - Comparison statement
KW - Medical informatics
KW - Natural language processing
KW - Patient selection
UR - http://www.scopus.com/inward/record.url?scp=84968760283&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84968760283&partnerID=8YFLogxK
U2 - 10.3414/ME15-01-0112
DO - 10.3414/ME15-01-0112
M3 - Article
C2 - 26940748
AN - SCOPUS:84968760283
SN - 0026-1270
VL - 55
SP - 266
EP - 275
JO - Methods of Information in Medicine
JF - Methods of Information in Medicine
IS - 3
ER -