Abstract
Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information from available knowledge sources. Results showed that a combination of evidence usually improved performance. The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on a testing set of ambiguous human gene symbols.
Original language | English (US) |
---|---|
Pages | 41-48 |
Number of pages | 8 |
DOIs | |
State | Published - 2007 |
Event | ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007 - Prague, Czech Republic Duration: Jun 29 2007 → … |
Other
Other | ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 6/29/07 → … |
ASJC Scopus subject areas
- Language and Linguistics
- Information Systems
- Software
- Health Informatics
- Computer Science Applications
- Biomedical Engineering