Information integration and knowledge acquisition from semantically heterogeneous biological data sources

Doina Caragea; Jyotishman Pathak; Jie Bao; Adrian Silvescu; Carson Andorf; Drena Dobbs; Vasant Honavar

doi:10.1007/11530084_15

Information integration and knowledge acquisition from semantically heterogeneous biological data sources

Doina Caragea, Jyotishman Pathak, Jie Bao, Adrian Silvescu, Carson Andorf, Drena Dobbs, Vasant Honavar

Quantitative Health Sciences

Research output: Contribution to journal › Conference article › peer-review

16 Scopus citations

Abstract

We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.

Original language	English (US)
Pages (from-to)	175-190
Number of pages	16
Journal	Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)
Volume	3615
DOIs	https://doi.org/10.1007/11530084_15
State	Published - 2005
Event	Second International Workshop on Data Integration in the Life Sciences, DILS 2005 - San Diego, CA, United States Duration: Jul 20 2005 → Jul 22 2005

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/11530084_15

Cite this

@article{595e57087d34432e8ecef73e1bd2a472,

title = "Information integration and knowledge acquisition from semantically heterogeneous biological data sources",

abstract = "We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.",

author = "Doina Caragea and Jyotishman Pathak and Jie Bao and Adrian Silvescu and Carson Andorf and Drena Dobbs and Vasant Honavar",

note = "Copyright: Copyright 2020 Elsevier B.V., All rights reserved.; Second International Workshop on Data Integration in the Life Sciences, DILS 2005 ; Conference date: 20-07-2005 Through 22-07-2005",

year = "2005",

doi = "10.1007/11530084_15",

language = "English (US)",

volume = "3615",

pages = "175--190",

journal = "Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)",

issn = "0302-9743",

publisher = "Springer Verlag",

}

TY - JOUR

T1 - Information integration and knowledge acquisition from semantically heterogeneous biological data sources

AU - Caragea, Doina

AU - Pathak, Jyotishman

AU - Bao, Jie

AU - Silvescu, Adrian

AU - Andorf, Carson

AU - Dobbs, Drena

AU - Honavar, Vasant

PY - 2005

Y1 - 2005

N2 - We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.

AB - We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.

UR - http://www.scopus.com/inward/record.url?scp=26444450699&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=26444450699&partnerID=8YFLogxK

U2 - 10.1007/11530084_15

DO - 10.1007/11530084_15

M3 - Conference article

AN - SCOPUS:26444450699

SN - 0302-9743

VL - 3615

SP - 175

EP - 190

JO - Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)

JF - Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science)

T2 - Second International Workshop on Data Integration in the Life Sciences, DILS 2005

Y2 - 20 July 2005 through 22 July 2005

ER -

Information integration and knowledge acquisition from semantically heterogeneous biological data sources

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this