Project Details
Description
Project Summary
The widespread adoption of Electronic Health Records (EHRs) has enabled the use of clinical data for clinical
research and healthcare delivery. Many institutions have established clinical data warehouses (CDWs) in
conjunction with cohort discovery tools (e.g., i2b2) to support the use of clinical data for clinical research
including retrospective clinical studies as well as feasibility assessment or patient recruitment for clinical trials.
However, a significant portion of relevant patient information is embedded in clinical narratives and natural
language processing (NLP) techniques such as information extraction are critical when using EHR data for
clinical research. Many clinical NLP systems have been developed to extract information from text for various
downstream applications but have had unsatisfactory performance and portability issues. Information retrieval
(IR), a technique used in search engines for storing, retrieving, and ranking documents from a large collection of
text documents based on users’ queries, can provide an alternative approach to leverage clinical narratives for
cohort discovery as it is less dependent on semantics. In order to accomplish this, additional work is needed
since current IR approaches are generally document-based and the formulation of cohort discovery as an IR
task requires the development of innovative IR approaches to handle complex EHR data and cohort criteria with
contextual (e.g., spatial or temporal) constraints.
Our long-term goal is to develop informatics solutions to accelerate the use of EHR data for clinical research.
The main goal of this proposal is to develop innovative IR methods, which formulate cohort discovery from EHR
data as an IR task, aiming to accelerate the identification of patient cohorts for cohort studies or the recruitment
of eligible patients for clinical trials. In our current R01-supported study (R01LM011934), we introduced novel
language models to enable the reuse of NLP-produced artifacts for IR-based cohort retrieval and developed
parallel resources for IR evaluation at two institutions (Mayo Clinic and OHSU). We hypothesize that, given
complex cohort criteria with contextual constraints, an IR framework with tailored architecture components (e.g.,
indexing, ranking, evaluation, and query processing) for storing and querying EHR data has an advantage over
traditional cohort discovery tools for querying unstructured EHR data as well as an advantage over text-based
search engines for querying both structured and unstructured EHR data. For the proposed renewal, we plan to
i) adopt common data models (CDMs) and deploy the framework at one additional site to assess the
generalizability of methods, ii) extend the IR framework to incorporate contextual information, and iii)
incorporate deep semantic representations into the IR framework. If successful, the proposed project will
advance informatics research on cohort discovery and identification, which impacts many applications based on
EHR data such as learning healthcare systems, predictive modeling, or AI in healthcare.
Status | Active |
---|---|
Effective start/end date | 9/20/14 → 4/30/24 |
Funding
- U.S. National Library of Medicine: $637,539.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.