A logistic normal multinomial regression model for microbiome compositional data analysis

Fan Xia, Jun Chen, Wing Kam Fung, Hongzhe Li

Research output: Contribution to journalArticlepeer-review

43 Scopus citations


Summary: Changes in human microbiome are associated with many human diseases. Next generation sequencing technologies make it possible to quantify the microbial composition without the need for laboratory cultivation. One important problem of microbiome data analysis is to identify the environmental/biological covariates that are associated with different bacterial taxa. Taxa count data in microbiome studies are often over-dispersed and include many zeros. To account for such an over-dispersion, we propose to use an additive logistic normal multinomial regression model to associate the covariates to bacterial composition. The model can naturally account for sampling variabilities and zero observations and also allow for a flexible covariance structure among the bacterial taxa. In order to select the relevant covariates and to estimate the corresponding regression coefficients, we propose a group ℓ1 penalized likelihood estimation method for variable selection and estimation. We develop a Monte Carlo expectation-maximization algorithm to implement the penalized likelihood estimation. Our simulation results show that the proposed method outperforms the group ℓ1 penalized multinomial logistic regression and the Dirichlet multinomial regression models in variable selection. We demonstrate the methods using a data set that links human gut microbiome to micro-nutrients in order to identify the nutrients that are associated with the human gut microbiome enterotype.

Original languageEnglish (US)
Pages (from-to)1053-1063
Number of pages11
Issue number4
StatePublished - Dec 2013


  • Hierarchical model
  • Markov chain Monte Carlo
  • Over-dispersion
  • Regularization
  • Variable selection

ASJC Scopus subject areas

  • Statistics and Probability
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics


Dive into the research topics of 'A logistic normal multinomial regression model for microbiome compositional data analysis'. Together they form a unique fingerprint.

Cite this