A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

Sicheng Zhou; Nan Wang; Liwei Wang; Ju Sun; Anne Blaes; Hongfang Liu; Rui Zhang

doi:10.1016/j.csbj.2023.08.018

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

Sicheng Zhou, Nan Wang, Liwei Wang, Ju Sun, Anne Blaes, Hongfang Liu, Rui Zhang

Digital Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task. Materials and methods: Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances. Results: We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932). Conclusions: The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.

Original language	English (US)
Pages (from-to)	32-40
Number of pages	9
Journal	Computational and Structural Biotechnology Journal
Volume	22
DOIs	https://doi.org/10.1016/j.csbj.2023.08.018
State	Published - Jan 2023

Keywords

Electronic health records
Generalizability
Information extraction
Natural language processing

ASJC Scopus subject areas

Biotechnology
Biophysics
Structural Biology
Biochemistry
Genetics
Computer Science Applications

Access to Document

10.1016/j.csbj.2023.08.018

Cite this

@article{78514eac5f324819aec99d6a1f312929,

title = "A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records",

abstract = "Objective: Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task. Materials and methods: Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances. Results: We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932). Conclusions: The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.",

keywords = "Electronic health records, Generalizability, Information extraction, Natural language processing",

author = "Sicheng Zhou and Nan Wang and Liwei Wang and Ju Sun and Anne Blaes and Hongfang Liu and Rui Zhang",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2023",

month = jan,

doi = "10.1016/j.csbj.2023.08.018",

language = "English (US)",

volume = "22",

pages = "32--40",

journal = "Computational and Structural Biotechnology Journal",

issn = "2001-0370",

publisher = "Research Network of Computational and Structural Biotechnology",

}

TY - JOUR

T1 - A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

AU - Zhou, Sicheng

AU - Wang, Nan

AU - Wang, Liwei

AU - Sun, Ju

AU - Blaes, Anne

AU - Liu, Hongfang

AU - Zhang, Rui

PY - 2023/1

Y1 - 2023/1

N2 - Objective: Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task. Materials and methods: Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances. Results: We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932). Conclusions: The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.

AB - Objective: Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task. Materials and methods: Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances. Results: We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932). Conclusions: The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.

KW - Electronic health records

KW - Generalizability

KW - Information extraction

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85170076099&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85170076099&partnerID=8YFLogxK

U2 - 10.1016/j.csbj.2023.08.018

DO - 10.1016/j.csbj.2023.08.018

M3 - Article

AN - SCOPUS:85170076099

SN - 2001-0370

VL - 22

SP - 32

EP - 40

JO - Computational and Structural Biotechnology Journal

JF - Computational and Structural Biotechnology Journal

ER -

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this