TY - GEN
T1 - Enhancing Clinical Information Retrieval through Context-Aware Queries and Indices
AU - Wen, Andrew
AU - Wang, Yanshan
AU - Kaggal, Vinod C.
AU - Liu, Sijia
AU - Liu, Hongfang
AU - Fan, Jungwei
N1 - Funding Information:
This work was supported by National Institutes of Health grants R01LM011934 and U01TR002062. This study was approved by the Mayo Clinic institutional review board (IRB # 17-003030) for human subject research
Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - The big data revolution has created a hefty demand for searching large-scale electronic health records (EHRs) to support clinical practice, research, and administration. Despite the volume of data involved, fast and accurate identification of clinical narratives pertinent to a clinical case being seen by any given provider is crucial for decision-making at the point of care. In the general domain, this capability is accomplished through a combination of the inverted index data structure, horizontal scaling, and information retrieval (IR) scoring algorithms. These technologies are also being used in the clinical domain, but have met limited success, particularly as clinical cases become more complex. One barrier affecting clinical performance is that contextual information, such as negation, temporality, and the subject of clinical mentions, impact clinical relevance but is not considered in general IR methodologies. In this study, we implemented a solution by identifying and incorporating the aforementioned semantic contexts as part of IR indexing/scoring with Elasticsearch. Experiments were conducted in comparison to baseline approaches with respect to: 1) evaluation of the impact on the quality (relevance) of the returned results, and 2) evaluation of the impact on execution time and storage requirements. The results showed a 5.1-23.1% improvement in retrieval quality, along with achieving 35% faster query execution time. Cost-wise, the solution required 1.5-2 times larger space and about 3 times increase in indexing time. The higher relevance demonstrated the merit of incorporating contextual information into clinical IR, and the near-constant increase in time and space suggested promising scalability.
AB - The big data revolution has created a hefty demand for searching large-scale electronic health records (EHRs) to support clinical practice, research, and administration. Despite the volume of data involved, fast and accurate identification of clinical narratives pertinent to a clinical case being seen by any given provider is crucial for decision-making at the point of care. In the general domain, this capability is accomplished through a combination of the inverted index data structure, horizontal scaling, and information retrieval (IR) scoring algorithms. These technologies are also being used in the clinical domain, but have met limited success, particularly as clinical cases become more complex. One barrier affecting clinical performance is that contextual information, such as negation, temporality, and the subject of clinical mentions, impact clinical relevance but is not considered in general IR methodologies. In this study, we implemented a solution by identifying and incorporating the aforementioned semantic contexts as part of IR indexing/scoring with Elasticsearch. Experiments were conducted in comparison to baseline approaches with respect to: 1) evaluation of the impact on the quality (relevance) of the returned results, and 2) evaluation of the impact on execution time and storage requirements. The results showed a 5.1-23.1% improvement in retrieval quality, along with achieving 35% faster query execution time. Cost-wise, the solution required 1.5-2 times larger space and about 3 times increase in indexing time. The higher relevance demonstrated the merit of incorporating contextual information into clinical IR, and the near-constant increase in time and space suggested promising scalability.
KW - Clinical Information Retrieval
KW - EHR
KW - Elasticsearch
KW - Electronic Health Records
KW - Information Retrieval
UR - http://www.scopus.com/inward/record.url?scp=85081359140&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081359140&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006241
DO - 10.1109/BigData47090.2019.9006241
M3 - Conference contribution
AN - SCOPUS:85081359140
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 2800
EP - 2807
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
Y2 - 9 December 2019 through 12 December 2019
ER -