Abstract
Free-form radiology reports associated with coronary computed tomography angiography (CCTA) include nuanced and complicated linguistics to report cardiovascular disease. Standardization and interpretation of such reports is crucial for clinical use of CCTA. Coronary Artery Disease Reporting and Data System (CAD-RADS) has been proposed to achieve such standardization by implementing a strict template-based report writing and assignment of a score between 0 and 5 indicating the severity of coronary artery lesions. Even after its introduction, free-form unstructured report writing remains popular among radiologists. In this work, we present our attempts at bridging the gap between structured and unstructured reporting by natural language processing. We present machine learning models that while being trained only on structured reports, can predict CAD-RADS scores by analysis of free-text of unstructured radiology reports. The best model achieves 98% accuracy on structured reports and 92% 1-margin accuracy (difference of 1 in the predicted and the actual scores) for free-form unstructured reports. Our model also performs well under very difficult circumstances including nuanced and widely varying terminology used for reporting cardiovascular functions and diseases, scarcity of labeled data for training our model, and uneven class label distribution.
Original language | English (US) |
---|---|
Article number | 3474831 |
Journal | ACM Transactions on Computing for Healthcare |
Volume | 3 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2022 |
Keywords
- CAD-RADS score prediction
- deep learning
- natural language processing
ASJC Scopus subject areas
- Software
- Medicine (miscellaneous)
- Information Systems
- Biomedical Engineering
- Computer Science Applications
- Health Informatics
- Health Information Management