Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography

Amara Tariq; Marly Van Assen; Carlo N. De Cecco; Imon Banerjee

doi:10.1145/3474831

Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography

Amara Tariq, Marly Van Assen, Carlo N. De Cecco, Imon Banerjee

Diagnostic Radiology

Research output: Contribution to journal › Article › peer-review

Abstract

Free-form radiology reports associated with coronary computed tomography angiography (CCTA) include nuanced and complicated linguistics to report cardiovascular disease. Standardization and interpretation of such reports is crucial for clinical use of CCTA. Coronary Artery Disease Reporting and Data System (CAD-RADS) has been proposed to achieve such standardization by implementing a strict template-based report writing and assignment of a score between 0 and 5 indicating the severity of coronary artery lesions. Even after its introduction, free-form unstructured report writing remains popular among radiologists. In this work, we present our attempts at bridging the gap between structured and unstructured reporting by natural language processing. We present machine learning models that while being trained only on structured reports, can predict CAD-RADS scores by analysis of free-text of unstructured radiology reports. The best model achieves 98% accuracy on structured reports and 92% 1-margin accuracy (difference of 1 in the predicted and the actual scores) for free-form unstructured reports. Our model also performs well under very difficult circumstances including nuanced and widely varying terminology used for reporting cardiovascular functions and diseases, scarcity of labeled data for training our model, and uneven class label distribution.

Original language	English (US)
Article number	3474831
Journal	ACM Transactions on Computing for Healthcare
Volume	3
Issue number	1
DOIs	https://doi.org/10.1145/3474831
State	Published - Jan 2022

Keywords

CAD-RADS score prediction
deep learning
natural language processing

ASJC Scopus subject areas

Software
Medicine (miscellaneous)
Information Systems
Biomedical Engineering
Computer Science Applications
Health Informatics
Health Information Management

Access to Document

10.1145/3474831

Cite this

@article{e4f54429528743ca89cd14c6ecaa4b34,

title = "Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography",

abstract = "Free-form radiology reports associated with coronary computed tomography angiography (CCTA) include nuanced and complicated linguistics to report cardiovascular disease. Standardization and interpretation of such reports is crucial for clinical use of CCTA. Coronary Artery Disease Reporting and Data System (CAD-RADS) has been proposed to achieve such standardization by implementing a strict template-based report writing and assignment of a score between 0 and 5 indicating the severity of coronary artery lesions. Even after its introduction, free-form unstructured report writing remains popular among radiologists. In this work, we present our attempts at bridging the gap between structured and unstructured reporting by natural language processing. We present machine learning models that while being trained only on structured reports, can predict CAD-RADS scores by analysis of free-text of unstructured radiology reports. The best model achieves 98% accuracy on structured reports and 92% 1-margin accuracy (difference of 1 in the predicted and the actual scores) for free-form unstructured reports. Our model also performs well under very difficult circumstances including nuanced and widely varying terminology used for reporting cardiovascular functions and diseases, scarcity of labeled data for training our model, and uneven class label distribution.",

keywords = "CAD-RADS score prediction, deep learning, natural language processing",

author = "Amara Tariq and {Van Assen}, Marly and {De Cecco}, {Carlo N.} and Imon Banerjee",

note = "Publisher Copyright: {\textcopyright} 2021 Association for Computing Machinery.",

year = "2022",

month = jan,

doi = "10.1145/3474831",

language = "English (US)",

volume = "3",

journal = "ACM Transactions on Computing for Healthcare",

issn = "2691-1957",

publisher = "Association for Computing Machinery (ACM)",

number = "1",

}

TY - JOUR

T1 - Bridging the Gap between Structured and Free-form Radiology Reporting

T2 - A Case-study on Coronary CT Angiography

AU - Tariq, Amara

AU - Van Assen, Marly

AU - De Cecco, Carlo N.

AU - Banerjee, Imon

PY - 2022/1

Y1 - 2022/1

N2 - Free-form radiology reports associated with coronary computed tomography angiography (CCTA) include nuanced and complicated linguistics to report cardiovascular disease. Standardization and interpretation of such reports is crucial for clinical use of CCTA. Coronary Artery Disease Reporting and Data System (CAD-RADS) has been proposed to achieve such standardization by implementing a strict template-based report writing and assignment of a score between 0 and 5 indicating the severity of coronary artery lesions. Even after its introduction, free-form unstructured report writing remains popular among radiologists. In this work, we present our attempts at bridging the gap between structured and unstructured reporting by natural language processing. We present machine learning models that while being trained only on structured reports, can predict CAD-RADS scores by analysis of free-text of unstructured radiology reports. The best model achieves 98% accuracy on structured reports and 92% 1-margin accuracy (difference of 1 in the predicted and the actual scores) for free-form unstructured reports. Our model also performs well under very difficult circumstances including nuanced and widely varying terminology used for reporting cardiovascular functions and diseases, scarcity of labeled data for training our model, and uneven class label distribution.

AB - Free-form radiology reports associated with coronary computed tomography angiography (CCTA) include nuanced and complicated linguistics to report cardiovascular disease. Standardization and interpretation of such reports is crucial for clinical use of CCTA. Coronary Artery Disease Reporting and Data System (CAD-RADS) has been proposed to achieve such standardization by implementing a strict template-based report writing and assignment of a score between 0 and 5 indicating the severity of coronary artery lesions. Even after its introduction, free-form unstructured report writing remains popular among radiologists. In this work, we present our attempts at bridging the gap between structured and unstructured reporting by natural language processing. We present machine learning models that while being trained only on structured reports, can predict CAD-RADS scores by analysis of free-text of unstructured radiology reports. The best model achieves 98% accuracy on structured reports and 92% 1-margin accuracy (difference of 1 in the predicted and the actual scores) for free-form unstructured reports. Our model also performs well under very difficult circumstances including nuanced and widely varying terminology used for reporting cardiovascular functions and diseases, scarcity of labeled data for training our model, and uneven class label distribution.

KW - CAD-RADS score prediction

KW - deep learning

KW - natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85125739411&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85125739411&partnerID=8YFLogxK

U2 - 10.1145/3474831

DO - 10.1145/3474831

M3 - Article

AN - SCOPUS:85125739411

SN - 2691-1957

VL - 3

JO - ACM Transactions on Computing for Healthcare

JF - ACM Transactions on Computing for Healthcare

IS - 1

M1 - 3474831

ER -

Bridging the Gap between Structured and Free-form Radiology Reporting: A Case-study on Coronary CT Angiography

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this