Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation

Peijin Han; Sunyang Fu; Julie Kolis; Richard Hughes; Brian R. Hallstrom; Martha Carvour; Hilal Maradit-Kremers; Sunghwan Sohn; V. G.Vinod Vydiswaran

doi:10.2196/38155

Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation

Peijin Han, Sunyang Fu, Julie Kolis, Richard Hughes, Brian R. Hallstrom, Martha Carvour, Hilal Maradit-Kremers, Sunghwan Sohn, V. G.Vinod Vydiswaran

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Background: Natural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic. Objective: This study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices. Methods: We conducted iterative test-apply-refinement processes with three involved sites—the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords. Results: MedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%). Conclusions: High performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.

Original language	English (US)
Article number	e38155
Journal	JMIR Medical Informatics
Volume	10
Issue number	8
DOIs	https://doi.org/10.2196/38155
State	Published - Aug 1 2022

Keywords

information extraction
model transferability
natural language processing
total hip arthroplasty

ASJC Scopus subject areas

Health Informatics
Health Information Management

Access to Document

10.2196/38155

Cite this

Han, P., Fu, S., Kolis, J., Hughes, R., Hallstrom, B. R., Carvour, M., Maradit-Kremers, H., Sohn, S., & Vydiswaran, V. G. V. (2022). Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation. JMIR Medical Informatics, 10(8), Article e38155. https://doi.org/10.2196/38155

Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation. / Han, Peijin; Fu, Sunyang; Kolis, Julie et al.
In: JMIR Medical Informatics, Vol. 10, No. 8, e38155, 01.08.2022.

Research output: Contribution to journal › Article › peer-review

Han, P, Fu, S, Kolis, J, Hughes, R, Hallstrom, BR, Carvour, M, Maradit-Kremers, H , Sohn, S & Vydiswaran, VGV 2022, 'Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation', JMIR Medical Informatics, vol. 10, no. 8, e38155. https://doi.org/10.2196/38155

@article{e1afc66c72bf45c1a878b6e4e9ed0bc3,

title = "Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation",

abstract = "Background: Natural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic. Objective: This study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices. Methods: We conducted iterative test-apply-refinement processes with three involved sites—the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords. Results: MedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%). Conclusions: High performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.",

keywords = "information extraction, model transferability, natural language processing, total hip arthroplasty",

author = "Peijin Han and Sunyang Fu and Julie Kolis and Richard Hughes and Hallstrom, {Brian R.} and Martha Carvour and Hilal Maradit-Kremers and Sunghwan Sohn and Vydiswaran, {V. G.Vinod}",

note = "Publisher Copyright: {\textcopyright}Peijin Han, Sunyang Fu, Julie Kolis, Richard Hughes, Brian R Hallstrom, Martha Carvour, Hilal Maradit-Kremers, Sunghwan Sohn, VG Vinod Vydiswaran.",

year = "2022",

month = aug,

day = "1",

doi = "10.2196/38155",

language = "English (US)",

volume = "10",

journal = "JMIR Medical Informatics",

issn = "2291-9694",

publisher = "JMIR Publications Inc.",

number = "8",

}

TY - JOUR

T1 - Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty

T2 - Algorithm Development and Validation

AU - Han, Peijin

AU - Fu, Sunyang

AU - Kolis, Julie

AU - Hughes, Richard

AU - Hallstrom, Brian R.

AU - Carvour, Martha

AU - Maradit-Kremers, Hilal

AU - Sohn, Sunghwan

AU - Vydiswaran, V. G.Vinod

N1 - Publisher Copyright: ©Peijin Han, Sunyang Fu, Julie Kolis, Richard Hughes, Brian R Hallstrom, Martha Carvour, Hilal Maradit-Kremers, Sunghwan Sohn, VG Vinod Vydiswaran.

PY - 2022/8/1

Y1 - 2022/8/1

N2 - Background: Natural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic. Objective: This study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices. Methods: We conducted iterative test-apply-refinement processes with three involved sites—the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords. Results: MedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%). Conclusions: High performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.

AB - Background: Natural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic. Objective: This study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices. Methods: We conducted iterative test-apply-refinement processes with three involved sites—the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords. Results: MedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%). Conclusions: High performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.

KW - information extraction

KW - model transferability

KW - natural language processing

KW - total hip arthroplasty

UR - http://www.scopus.com/inward/record.url?scp=85140269753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85140269753&partnerID=8YFLogxK

U2 - 10.2196/38155

DO - 10.2196/38155

M3 - Article

AN - SCOPUS:85140269753

SN - 2291-9694

VL - 10

JO - JMIR Medical Informatics

JF - JMIR Medical Informatics

IS - 8

M1 - e38155

ER -

Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this