TY - JOUR
T1 - Disambiguating ambiguous biomedical terms in biomedical narrative text
T2 - An unsupervised method
AU - Liu, Hongfang
AU - Lussier, Yves A.
AU - Friedman, Carol
N1 - Funding Information:
We thank Dr Andrey Rzhetsky of the Columbia Genome Center at Columbia University and Hong Yu in the Department of Medical Informatics at Columbia University for enabling the access to the collection of MEDLINE abstracts. This study was supported in part by Grants LM06274 from the NLM and ILS-9817434 from the NSF.
PY - 2001
Y1 - 2001
N2 - With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.
AB - With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.
KW - Corpus-based machine learning
KW - MEDLINE
KW - MedLEE
KW - Natural language processing
KW - UMLS
KW - Word sense disambiguation
UR - http://www.scopus.com/inward/record.url?scp=0035564886&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0035564886&partnerID=8YFLogxK
U2 - 10.1006/jbin.2001.1023
DO - 10.1006/jbin.2001.1023
M3 - Article
C2 - 11977807
AN - SCOPUS:0035564886
SN - 1532-0464
VL - 34
SP - 249
EP - 261
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 4
ER -