Bayesian semi-nonnegative matrix tri-factorization to identify pathways associated with cancer phenotypes

Sunho Park, Nabhonil Kar, Jae Ho Cheong, Tae Hyun Hwang

Research output: Contribution to journalConference articlepeer-review


Accurate identification of pathways associated with cancer phenotypes (e.g., cancer subtypes and treatment outcomes) could lead to discovering reliable prognostic and/or pre-dictive biomarkers for better patients stratification and treatment guidance. In our previous work, we have shown that non-negative matrix tri-factorization (NMTF) can be successfully applied to identify pathways associated with specific cancer types or disease classes as a prog-nostic and predictive biomarker. However, one key limitation of non-negative factorization methods, including various non-negative bi-factorization methods, is their limited ability to handle negative input data. For example, many types of molecular data that consist of real-values containing both positive and negative values (e.g., normalized/log transformed gene expression data where negative values represent down-regulated expression of genes) are not suitable input for these algorithms. In addition, most previous methods provide just a single point estimate and hence cannot deal with uncertainty effectively. To address these limitations, we propose a Bayesian semi-nonnegative matrix tri-factorization method to identify pathways associated with cancer phenotypes from a real-valued input matrix, e.g., gene expression values. Motivated by semi-nonnegative factoriza-tion, we allow one of the factor matrices, the centroid matrix, to be real-valued so that each centroid can express either the up-or down-regulation of the member genes in a pathway. In addition, we place structured spike-And-slab priors (which are encoded with the pathways and a gene-gene interaction (GGI) network) on the centroid matrix so that even a set of genes that is not initially contained in the pathways (due to the incompleteness of the cur-rent pathway database) can be involved in the factorization in a stochastic way specifically, if those genes are connected to the member genes of the pathways on the GGI network. We also present update rules for the posterior distributions in the framework of variational inference. As a full Bayesian method, our proposed method has several advantages over the current NMTF methods, which are demonstrated using synthetic datasets in experiments. Using the The Cancer Genome Atlas (TCGA) gastric cancer and metastatic gastric cancer immunotherapy clinical-Trial datasets, we show that our method could identify biologically and clinically relevant pathways associated with the molecular subtypes and immunother-apy response, respectively. Finally, we show that those pathways identified by the proposed method could be used as prognostic biomarkers to stratify patients with distinct survival outcome in two independent validation datasets. Additional information and codes can be found at

Original languageEnglish (US)
Pages (from-to)427-438
Number of pages12
JournalPacific Symposium on Biocomputing
Issue number2020
StatePublished - 2020
Event25th Pacific Symposium on Biocomputing, PSB 2020 - Big Island, United States
Duration: Jan 3 2020Jan 7 2020


  • Bayesian learning
  • Phenotype-pathway association
  • Semi-non-negative tri-matrix factorization
  • Structured slab-And-spike distribution
  • Variational inference

ASJC Scopus subject areas

  • Biomedical Engineering
  • Computational Theory and Mathematics


Dive into the research topics of 'Bayesian semi-nonnegative matrix tri-factorization to identify pathways associated with cancer phenotypes'. Together they form a unique fingerprint.

Cite this