TY - GEN
T1 - Mapping gene/protein names in free text to biomedical databases
AU - Liu, Hongfang
AU - Torii, Manabu
AU - Hu, Zhang Zhi
AU - Wu, Cathy
PY - 2007
Y1 - 2007
N2 - Observing that many biomedical databases have been developed and maintained independently, their records referring to the same entities may have different sets of synonyms. Integration of names pertaining to the same entity would provide a more comprehensive list of synonyms than each individual database. We have assembled BioThesaurus, a thesaurus of proteins and their corresponding genes compiled from multiple databases for all UniProtKB records. In this study, the coverage of BioThesaurus, and the contribution of each individual database were assessed for several organisms. The result indicates that the coverage of BioThesaurus is over 80% for most of the organisms with an average of 85.4%. When restricted to individual databases or resources, the percentages dropped ranging from 3 to 30%. The study demonstrated that each individual database or resource has some synonyms not covered by other databases or resources, and a list of names compiled from multiple databases would be desired for systems requiring high recall.
AB - Observing that many biomedical databases have been developed and maintained independently, their records referring to the same entities may have different sets of synonyms. Integration of names pertaining to the same entity would provide a more comprehensive list of synonyms than each individual database. We have assembled BioThesaurus, a thesaurus of proteins and their corresponding genes compiled from multiple databases for all UniProtKB records. In this study, the coverage of BioThesaurus, and the contribution of each individual database were assessed for several organisms. The result indicates that the coverage of BioThesaurus is over 80% for most of the organisms with an average of 85.4%. When restricted to individual databases or resources, the percentages dropped ranging from 3 to 30%. The study demonstrated that each individual database or resource has some synonyms not covered by other databases or resources, and a list of names compiled from multiple databases would be desired for systems requiring high recall.
UR - http://www.scopus.com/inward/record.url?scp=49549084332&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49549084332&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2007.94
DO - 10.1109/ICDMW.2007.94
M3 - Conference contribution
AN - SCOPUS:49549084332
SN - 0769530192
SN - 9780769530192
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 101
EP - 106
BT - ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
T2 - 17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
Y2 - 28 October 2007 through 31 October 2007
ER -