A query taxonomy describes performance of patient-level retrieval from electronic health record data

Steven R. Chamberlin; Steven D. Bedrick; Aaron M. Cohen; Yanshan Wang; Andrew Wen; Sijia Liu; Hongfang Liu; William R. Hersh

A query taxonomy describes performance of patient-level retrieval from electronic health record data

Steven R. Chamberlin, Steven D. Bedrick, Aaron M. Cohen, Yanshan Wang, Andrew Wen, Sijia Liu, Hongfang Liu, William R. Hersh

Digital Health Sciences

Research output: Contribution to journal › Conference article › peer-review

Abstract

Performance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure. We used an IR test collection containing 56 patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation. We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings. Some of the characteristics derived from a query taxonomy could lead to improved selection of approaches based on the structure of the topic of interest. Insights gained here will help guide future work to develop new methods for patient-level cohort discovery with EHR data.

Original language	English (US)
Pages (from-to)	27-33
Number of pages	7
Journal	CEUR Workshop Proceedings
Volume	2551
State	Published - 2020
Event	2020 ACM WSDM Health Search and Data Mining Workshop, HSDM 2020 - Houston, United States Duration: Feb 3 2020 → …

Keywords

Electronic health record
Information retrieval
Patient cohort discovery
Topic taxonomy

ASJC Scopus subject areas

General Computer Science

Cite this

@article{57a6734a50094ef2ac2473e54301b5ac,

title = "A query taxonomy describes performance of patient-level retrieval from electronic health record data",

abstract = "Performance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure. We used an IR test collection containing 56 patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation. We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings. Some of the characteristics derived from a query taxonomy could lead to improved selection of approaches based on the structure of the topic of interest. Insights gained here will help guide future work to develop new methods for patient-level cohort discovery with EHR data.",

keywords = "Electronic health record, Information retrieval, Patient cohort discovery, Topic taxonomy",

author = "Chamberlin, {Steven R.} and Bedrick, {Steven D.} and Cohen, {Aaron M.} and Yanshan Wang and Andrew Wen and Sijia Liu and Hongfang Liu and Hersh, {William R.}",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0); 2020 ACM WSDM Health Search and Data Mining Workshop, HSDM 2020 ; Conference date: 03-02-2020",

year = "2020",

language = "English (US)",

volume = "2551",

pages = "27--33",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - A query taxonomy describes performance of patient-level retrieval from electronic health record data

AU - Chamberlin, Steven R.

AU - Bedrick, Steven D.

AU - Cohen, Aaron M.

AU - Wang, Yanshan

AU - Wen, Andrew

AU - Liu, Sijia

AU - Liu, Hongfang

AU - Hersh, William R.

PY - 2020

Y1 - 2020

N2 - Performance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure. We used an IR test collection containing 56 patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation. We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings. Some of the characteristics derived from a query taxonomy could lead to improved selection of approaches based on the structure of the topic of interest. Insights gained here will help guide future work to develop new methods for patient-level cohort discovery with EHR data.

AB - Performance of systems used for patient cohort identification with electronic health record (EHR) data is not well-characterized. The objective of this research was to evaluate factors that might affect information retrieval (IR) methods and to investigate the interplay between commonly used IR approaches and the characteristics of the cohort definition structure. We used an IR test collection containing 56 patient cohort definitions, 100,000 patient records originating from an academic medical institution EHR data warehouse, and automated word-base query tasks, varying four parameters. Performance was measured using B-Pref. We then designed 59 taxonomy characteristics to classify the structure of the 56 topics. In addition, six topic complexity measures were derived from these characteristics for further evaluation using a beta regression simulation. We did not find a strong association between the 59 taxonomy characteristics and patient retrieval performance, but we did find strong performance associations with the six topic complexity measures created from these characteristics, and interactions between these measures and the automated query parameter settings. Some of the characteristics derived from a query taxonomy could lead to improved selection of approaches based on the structure of the topic of interest. Insights gained here will help guide future work to develop new methods for patient-level cohort discovery with EHR data.

KW - Electronic health record

KW - Information retrieval

KW - Patient cohort discovery

KW - Topic taxonomy

UR - http://www.scopus.com/inward/record.url?scp=85081654147&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85081654147&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85081654147

SN - 1613-0073

VL - 2551

SP - 27

EP - 33

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2020 ACM WSDM Health Search and Data Mining Workshop, HSDM 2020

Y2 - 3 February 2020

ER -

A query taxonomy describes performance of patient-level retrieval from electronic health record data

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this