Facilitating accurate health provider directories using natural language processing

Matthew J. Cook, Lixia Yao, Xiaoyan Wang

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Background: Accurate information in provider directories are vital in health care including health information exchange, health benefits exchange, quality reporting, and in the reimbursement and delivery of care. Maintaining provider directory data and keeping it up to date is challenging. The objective of this study is to determine the feasibility of using natural language processing (NLP) techniques to combine disparate resources and acquire accurate information on health providers. Methods: Publically available state licensure lists in Connecticut were obtained along with National Plan and Provider Enumeration System (NPPES) public use files. Connecticut licensure lists textual information of each health professional who is licensed to practice within the state. A NLP-based system was developed based on healthcare provider taxonomy code, location, name and address information to identify textual data within the state and federal records. Qualitative and quantitative evaluation were performed, and the recall and precision were calculated. Results: We identified nurse midwives, nurse practitioners, and dentists in the State of Connecticut. The recall and precision were 0.95 and 0.93 respectively. Using the system, we were able to accurately acquire 6849 of the 7177 records of health provider directory information. Conclusions: The authors demonstrated that the NLP- based approach was effective at acquiring health provider information. Furthermore, the NLP-based system can always be applied to update information further reducing processing burdens as data changes.

Original languageEnglish (US)
Article number80
JournalBMC Medical Informatics and Decision Making
StatePublished - Apr 4 2019

ASJC Scopus subject areas

  • Health Policy
  • Health Informatics
  • Computer Science Applications


Dive into the research topics of 'Facilitating accurate health provider directories using natural language processing'. Together they form a unique fingerprint.

Cite this