Using distributional analysis to semantically classify UMLS concepts.

Jung Wei Fan, Hua Xu, Carol Friedman

Research output: Contribution to journalArticlepeer-review


The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

Original languageEnglish (US)
Pages (from-to)519-523
Number of pages5
JournalMedinfo. MEDINFO
Issue numberPt 1
StatePublished - 2007

ASJC Scopus subject areas

  • General Medicine


Dive into the research topics of 'Using distributional analysis to semantically classify UMLS concepts.'. Together they form a unique fingerprint.

Cite this