Co-clustering phenome-genome for phenotype classification and disease gene discovery

Taehyun Hwang, Gowtham Atluri, Maoqiang Xie, Sanjoy Dey, Changjin Hong, Vipin Kumar, Rui Kuang

Research output: Contribution to journalArticlepeer-review


Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.

Original languageEnglish (US)
Pages (from-to)e146
JournalNucleic acids research
Issue number19
StatePublished - Oct 2012

ASJC Scopus subject areas

  • Genetics


Dive into the research topics of 'Co-clustering phenome-genome for phenotype classification and disease gene discovery'. Together they form a unique fingerprint.

Cite this