Natural Language Processing Model for Identifying Critical Findings—A Multi-Institutional Study

Imon Banerjee; Melissa A. Davis; Brianna L. Vey; Sina Mazaheri; Fiza Khan; Vaz Zavaletta; Roger Gerard; Judy Wawira Gichoya; Bhavik Patel

doi:10.1007/s10278-022-00712-w

Natural Language Processing Model for Identifying Critical Findings—A Multi-Institutional Study

Imon Banerjee, Melissa A. Davis, Brianna L. Vey, Sina Mazaheri, Fiza Khan, Vaz Zavaletta, Roger Gerard, Judy Wawira Gichoya, Bhavik Patel

Diagnostic Radiology

Research output: Contribution to journal › Article › peer-review

Abstract

Improving detection and follow-up of recommendations made in radiology reports is a critical unmet need. The long and unstructured nature of radiology reports limits the ability of clinicians to assimilate the full report and identify all the pertinent information for prioritizing the critical cases. We developed an automated NLP pipeline using a transformer-based ClinicalBERT⁺⁺ model which was fine-tuned on 3 M radiology reports and compared against the traditional BERT model. We validated the models on both internal hold-out ED cases from EUH as well as external cases from Mayo Clinic. We also evaluated the model by combining different sections of the radiology reports. On the internal test set of 3819 reports, the ClinicalBERT⁺⁺ model achieved 0.96 f1-score while the BERT also achieved the same performance using the reason for exam and impression sections. However, ClinicalBERT⁺⁺ outperformed BERT on the external test dataset of 2039 reports and achieved the highest performance for classifying critical finding reports (0.81 precision and 0.54 recall). The ClinicalBERT⁺⁺ model has been successfully applied to large-scale radiology reports from 5 different sites. Automated NLP system that can analyze free-text radiology reports, along with the reason for the exam, to identify critical radiology findings and recommendations could enable automated alert notifications to clinicians about the need for clinical follow-up. The clinical significance of our proposed model is that it could be used as an additional layer of safeguard to clinical practice and reduce the chance of important findings reported in a radiology report is not overlooked by clinicians as well as provide a way to retrospectively track large hospital databases for evaluating the documentation of the critical findings.

Original language	English (US)
Pages (from-to)	105-113
Number of pages	9
Journal	Journal of Digital Imaging
Volume	36
Issue number	1
DOIs	https://doi.org/10.1007/s10278-022-00712-w
State	Published - Feb 2023

ASJC Scopus subject areas

Radiological and Ultrasound Technology
Radiology Nuclear Medicine and imaging
Computer Science Applications

Access to Document

10.1007/s10278-022-00712-w

Cite this

@article{f1962d33dffb401fafabcb13440e1313,

title = "Natural Language Processing Model for Identifying Critical Findings—A Multi-Institutional Study",

abstract = "Improving detection and follow-up of recommendations made in radiology reports is a critical unmet need. The long and unstructured nature of radiology reports limits the ability of clinicians to assimilate the full report and identify all the pertinent information for prioritizing the critical cases. We developed an automated NLP pipeline using a transformer-based ClinicalBERT++ model which was fine-tuned on 3 M radiology reports and compared against the traditional BERT model. We validated the models on both internal hold-out ED cases from EUH as well as external cases from Mayo Clinic. We also evaluated the model by combining different sections of the radiology reports. On the internal test set of 3819 reports, the ClinicalBERT++ model achieved 0.96 f1-score while the BERT also achieved the same performance using the reason for exam and impression sections. However, ClinicalBERT++ outperformed BERT on the external test dataset of 2039 reports and achieved the highest performance for classifying critical finding reports (0.81 precision and 0.54 recall). The ClinicalBERT++ model has been successfully applied to large-scale radiology reports from 5 different sites. Automated NLP system that can analyze free-text radiology reports, along with the reason for the exam, to identify critical radiology findings and recommendations could enable automated alert notifications to clinicians about the need for clinical follow-up. The clinical significance of our proposed model is that it could be used as an additional layer of safeguard to clinical practice and reduce the chance of important findings reported in a radiology report is not overlooked by clinicians as well as provide a way to retrospectively track large hospital databases for evaluating the documentation of the critical findings.",

author = "Imon Banerjee and Davis, {Melissa A.} and Vey, {Brianna L.} and Sina Mazaheri and Fiza Khan and Vaz Zavaletta and Roger Gerard and Gichoya, {Judy Wawira} and Bhavik Patel",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.",

year = "2023",

month = feb,

doi = "10.1007/s10278-022-00712-w",

language = "English (US)",

volume = "36",

pages = "105--113",

journal = "Journal of Digital Imaging",

issn = "0897-1889",

publisher = "Springer New York",

number = "1",

}

TY - JOUR

T1 - Natural Language Processing Model for Identifying Critical Findings—A Multi-Institutional Study

AU - Banerjee, Imon

AU - Davis, Melissa A.

AU - Vey, Brianna L.

AU - Mazaheri, Sina

AU - Khan, Fiza

AU - Zavaletta, Vaz

AU - Gerard, Roger

AU - Gichoya, Judy Wawira

AU - Patel, Bhavik

PY - 2023/2

Y1 - 2023/2

N2 - Improving detection and follow-up of recommendations made in radiology reports is a critical unmet need. The long and unstructured nature of radiology reports limits the ability of clinicians to assimilate the full report and identify all the pertinent information for prioritizing the critical cases. We developed an automated NLP pipeline using a transformer-based ClinicalBERT++ model which was fine-tuned on 3 M radiology reports and compared against the traditional BERT model. We validated the models on both internal hold-out ED cases from EUH as well as external cases from Mayo Clinic. We also evaluated the model by combining different sections of the radiology reports. On the internal test set of 3819 reports, the ClinicalBERT++ model achieved 0.96 f1-score while the BERT also achieved the same performance using the reason for exam and impression sections. However, ClinicalBERT++ outperformed BERT on the external test dataset of 2039 reports and achieved the highest performance for classifying critical finding reports (0.81 precision and 0.54 recall). The ClinicalBERT++ model has been successfully applied to large-scale radiology reports from 5 different sites. Automated NLP system that can analyze free-text radiology reports, along with the reason for the exam, to identify critical radiology findings and recommendations could enable automated alert notifications to clinicians about the need for clinical follow-up. The clinical significance of our proposed model is that it could be used as an additional layer of safeguard to clinical practice and reduce the chance of important findings reported in a radiology report is not overlooked by clinicians as well as provide a way to retrospectively track large hospital databases for evaluating the documentation of the critical findings.

AB - Improving detection and follow-up of recommendations made in radiology reports is a critical unmet need. The long and unstructured nature of radiology reports limits the ability of clinicians to assimilate the full report and identify all the pertinent information for prioritizing the critical cases. We developed an automated NLP pipeline using a transformer-based ClinicalBERT++ model which was fine-tuned on 3 M radiology reports and compared against the traditional BERT model. We validated the models on both internal hold-out ED cases from EUH as well as external cases from Mayo Clinic. We also evaluated the model by combining different sections of the radiology reports. On the internal test set of 3819 reports, the ClinicalBERT++ model achieved 0.96 f1-score while the BERT also achieved the same performance using the reason for exam and impression sections. However, ClinicalBERT++ outperformed BERT on the external test dataset of 2039 reports and achieved the highest performance for classifying critical finding reports (0.81 precision and 0.54 recall). The ClinicalBERT++ model has been successfully applied to large-scale radiology reports from 5 different sites. Automated NLP system that can analyze free-text radiology reports, along with the reason for the exam, to identify critical radiology findings and recommendations could enable automated alert notifications to clinicians about the need for clinical follow-up. The clinical significance of our proposed model is that it could be used as an additional layer of safeguard to clinical practice and reduce the chance of important findings reported in a radiology report is not overlooked by clinicians as well as provide a way to retrospectively track large hospital databases for evaluating the documentation of the critical findings.

UR - http://www.scopus.com/inward/record.url?scp=85141480718&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85141480718&partnerID=8YFLogxK

U2 - 10.1007/s10278-022-00712-w

DO - 10.1007/s10278-022-00712-w

M3 - Article

C2 - 36344632

AN - SCOPUS:85141480718

SN - 0897-1889

VL - 36

SP - 105

EP - 113

JO - Journal of Digital Imaging

JF - Journal of Digital Imaging

IS - 1

ER -

Natural Language Processing Model for Identifying Critical Findings—A Multi-Institutional Study

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this