Using distributional analysis to semantically classify UMLS concepts.

Jung Wei Fan; Hua Xu; Carol Friedman

Using distributional analysis to semantically classify UMLS concepts.

Jung Wei Fan, Hua Xu, Carol Friedman

Digital Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

Original language	English (US)
Pages (from-to)	519-523
Number of pages	5
Journal	Medinfo. MEDINFO
Volume	12
Issue number	Pt 1
State	Published - 2007

ASJC Scopus subject areas

General Medicine

Cite this

@article{1e58ce3118004c9ea21b12987891d76d,

title = "Using distributional analysis to semantically classify UMLS concepts.",

abstract = "The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.",

author = "Fan, {Jung Wei} and Hua Xu and Carol Friedman",

year = "2007",

language = "English (US)",

volume = "12",

pages = "519--523",

journal = "Medinfo. MEDINFO",

issn = "1569-6332",

publisher = "IOS Press BV",

number = "Pt 1",

}

TY - JOUR

T1 - Using distributional analysis to semantically classify UMLS concepts.

AU - Fan, Jung Wei

AU - Xu, Hua

AU - Friedman, Carol

PY - 2007

Y1 - 2007

N2 - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

AB - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

UR - http://www.scopus.com/inward/record.url?scp=38449098208&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38449098208&partnerID=8YFLogxK

M3 - Article

C2 - 17911771

AN - SCOPUS:38449098208

SN - 1569-6332

VL - 12

SP - 519

EP - 523

JO - Medinfo. MEDINFO

JF - Medinfo. MEDINFO

IS - Pt 1

ER -

Using distributional analysis to semantically classify UMLS concepts.

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this