TY - JOUR
T1 - Predictive modeling of microbiome data using a phylogeny-regularized generalized linear mixed model
AU - Xiao, Jian
AU - Chen, Li
AU - Johnson, Stephen
AU - Yu, Yue
AU - Zhang, Xianyang
AU - Chen, Jun
N1 - Funding Information:
This work was supported by Mayo Clinic Gerstner Family Career Development Awards, Mayo Clinic Center for Individualized Medicine, U01 FD005875, Food and Drug Administration and the National Natural Science Foundation of China (no.61773401).
Publisher Copyright:
© 2018 Xiao, Chen, Johnson, Yu, Zhang and Chen.
PY - 2018/6/27
Y1 - 2018/6/27
N2 - Recent human microbiome studies have revealed an essential role of the human microbiome in health and disease, opening up the possibility of building microbiome-based predictive models for individualized medicine. One unique characteristic of microbiome data is the existence of a phylogenetic tree that relates all the microbial species. It has frequently been observed that a cluster or clusters of bacteria at varying phylogenetic depths are associated with some clinical or biological outcome due to shared biological function (clustered signal). Moreover, in many cases, we observe a community-level change, where a large number of functionally interdependent species are associated with the outcome (dense signal). We thus develop "glmmTree," a prediction method based on a generalized linear mixed model framework, for capturing clustered and dense microbiome signals. glmmTree uses the similarity between microbiomes, which is defined based on the microbiome composition and the phylogenetic tree, to predict the outcome. The effects of other predictive variables (e.g., age, sex) can be incorporated readily in the regression framework. Additional tuning parameters enable a data-adaptive approach to capture signals at different phylogenetic depth and abundance level. Simulation studies and real data applications demonstrated that "glmmTree" outperformed existing methods in the dense and clustered signal scenarios.
AB - Recent human microbiome studies have revealed an essential role of the human microbiome in health and disease, opening up the possibility of building microbiome-based predictive models for individualized medicine. One unique characteristic of microbiome data is the existence of a phylogenetic tree that relates all the microbial species. It has frequently been observed that a cluster or clusters of bacteria at varying phylogenetic depths are associated with some clinical or biological outcome due to shared biological function (clustered signal). Moreover, in many cases, we observe a community-level change, where a large number of functionally interdependent species are associated with the outcome (dense signal). We thus develop "glmmTree," a prediction method based on a generalized linear mixed model framework, for capturing clustered and dense microbiome signals. glmmTree uses the similarity between microbiomes, which is defined based on the microbiome composition and the phylogenetic tree, to predict the outcome. The effects of other predictive variables (e.g., age, sex) can be incorporated readily in the regression framework. Additional tuning parameters enable a data-adaptive approach to capture signals at different phylogenetic depth and abundance level. Simulation studies and real data applications demonstrated that "glmmTree" outperformed existing methods in the dense and clustered signal scenarios.
KW - Generalized mixed model
KW - Kernel method
KW - Microbiome
KW - Phylogenetic tree
KW - Predictive model
UR - http://www.scopus.com/inward/record.url?scp=85049108230&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049108230&partnerID=8YFLogxK
U2 - 10.3389/fmicb.2018.01391
DO - 10.3389/fmicb.2018.01391
M3 - Article
AN - SCOPUS:85049108230
SN - 1664-302X
VL - 9
JO - Frontiers in Microbiology
JF - Frontiers in Microbiology
IS - JUN
M1 - 1391
ER -