Using distributional analysis to semantically classify UMLS concepts

Jung Wei Fan, Hua Xu, Carol Friedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

Original languageEnglish (US)
Title of host publicationMEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics
Subtitle of host publicationBuilding Sustainable Health Systems
PublisherIOS Press
Number of pages5
ISBN (Print)9781586037741
StatePublished - 2007
Event12th World Congress on Medical Informatics, MEDINFO 2007 - Brisbane, QLD, Australia
Duration: Aug 20 2007Aug 24 2007

Publication series

NameStudies in Health Technology and Informatics
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365


Other12th World Congress on Medical Informatics, MEDINFO 2007
CityBrisbane, QLD


  • UMLS
  • distributional similarity
  • natural language processing
  • semantic classification

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management


Dive into the research topics of 'Using distributional analysis to semantically classify UMLS concepts'. Together they form a unique fingerprint.

Cite this