Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis

Jun Chen, Frederic D. Bushman, James D. Lewis, Gary D. Wu, Hongzhe Li

Research output: Contribution to journalArticlepeer-review

88 Scopus citations


Motivated by studying the association between nutrient intake and human gut microbiome composition, we developed a method for structure-constrained sparse canonical correlation analysis (ssCCA) in a high-dimensional setting. ssCCA takes into account the phylogenetic relationships among bacteria, which provides important prior knowledge on evolutionary relationships among bacterial taxa. Our ssCCA formulation utilizes a phylogenetic structure-constrained penalty function to impose certain smoothness on the linear coefficients according to the phylogenetic relationships among the taxa. An efficient coordinate descent algorithm is developed for optimization. A human gut microbiome data set is used to illustrate this method. Both simulations and real data applications show that ssCCA performs better than the standard sparse CCA in identifying meaningful variables when there are structures in the data.

Original languageEnglish (US)
Pages (from-to)244-258
Number of pages15
Issue number2
StatePublished - Apr 2013


  • Dimension reduction
  • Graph
  • Phylogenetic tree
  • Regularization
  • Variable selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis'. Together they form a unique fingerprint.

Cite this