TY - JOUR
T1 - Probing Patient Messages Enhanced by Natural Language Processing
T2 - A Top-Down Message Corpus Analysis
AU - Mastorakos, George
AU - Khurana, Aditya
AU - Huang, Ming
AU - Fu, Sunyang
AU - Tafti, Ahmad P.
AU - Fan, Jungwei
AU - Liu, Hongfang
N1 - Publisher Copyright:
Copyright © 2021 George Mastorakos et al. Exclusive Licensee Peking University Health Science Center. Distributed under a Creative Commons Attribution License (CC BY 4.0).
PY - 2021
Y1 - 2021
N2 - Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3, 000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The “Findings” medical concept had the largest number of keywords across all groupings of content types and departments. “Anatomical Sites” and “Disorders” keywords were more prevalent in Active Symptom messages, while “Drugs” keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of “Anatomical Sites, ”, “Disorders, ”, “Drugs, ”, and “Findings” keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.
AB - Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3, 000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The “Findings” medical concept had the largest number of keywords across all groupings of content types and departments. “Anatomical Sites” and “Disorders” keywords were more prevalent in Active Symptom messages, while “Drugs” keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of “Anatomical Sites, ”, “Disorders, ”, “Drugs, ”, and “Findings” keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.
UR - http://www.scopus.com/inward/record.url?scp=85149779553&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149779553&partnerID=8YFLogxK
U2 - 10.34133/2021/1504854
DO - 10.34133/2021/1504854
M3 - Article
AN - SCOPUS:85149779553
SN - 2097-1095
VL - 2021
JO - Health Data Science
JF - Health Data Science
M1 - 1504854
ER -