Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method

Hongfang Liu; Yves A. Lussier; Carol Friedman

doi:10.1006/jbin.2001.1023

Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method

Hongfang Liu, Yves A. Lussier, Carol Friedman

Digital Health Sciences

Research output: Contribution to journal › Article › peer-review

54 Scopus citations

Abstract

With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

Original language	English (US)
Pages (from-to)	249-261
Number of pages	13
Journal	Journal of Biomedical Informatics
Volume	34
Issue number	4
DOIs	https://doi.org/10.1006/jbin.2001.1023
State	Published - 2001

Keywords

Corpus-based machine learning
MEDLINE
MedLEE
Natural language processing
UMLS
Word sense disambiguation

ASJC Scopus subject areas

Computer Science Applications
Health Informatics

Access to Document

10.1006/jbin.2001.1023

Cite this

@article{380592033023484796fd70b4b428c5d9,

title = "Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method",

abstract = "With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.",

keywords = "Corpus-based machine learning, MEDLINE, MedLEE, Natural language processing, UMLS, Word sense disambiguation",

author = "Hongfang Liu and Lussier, {Yves A.} and Carol Friedman",

note = "Funding Information: We thank Dr Andrey Rzhetsky of the Columbia Genome Center at Columbia University and Hong Yu in the Department of Medical Informatics at Columbia University for enabling the access to the collection of MEDLINE abstracts. This study was supported in part by Grants LM06274 from the NLM and ILS-9817434 from the NSF.",

year = "2001",

doi = "10.1006/jbin.2001.1023",

language = "English (US)",

volume = "34",

pages = "249--261",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

number = "4",

}

TY - JOUR

T1 - Disambiguating ambiguous biomedical terms in biomedical narrative text

T2 - An unsupervised method

AU - Liu, Hongfang

AU - Lussier, Yves A.

AU - Friedman, Carol

N1 - Funding Information: We thank Dr Andrey Rzhetsky of the Columbia Genome Center at Columbia University and Hong Yu in the Department of Medical Informatics at Columbia University for enabling the access to the collection of MEDLINE abstracts. This study was supported in part by Grants LM06274 from the NLM and ILS-9817434 from the NSF.

PY - 2001

Y1 - 2001

N2 - With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

AB - With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

KW - Corpus-based machine learning

KW - MEDLINE

KW - MedLEE

KW - Natural language processing

KW - UMLS

KW - Word sense disambiguation

UR - http://www.scopus.com/inward/record.url?scp=0035564886&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035564886&partnerID=8YFLogxK

U2 - 10.1006/jbin.2001.1023

DO - 10.1006/jbin.2001.1023

M3 - Article

C2 - 11977807

AN - SCOPUS:0035564886

SN - 1532-0464

VL - 34

SP - 249

EP - 261

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

IS - 4

ER -

Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this