TY - JOUR
T1 - Considerations for automated machine learning in clinical metabolic profiling
T2 - 23rd Pacific Symposium on Biocomputing, PSB 2018
AU - Orlenko, Alena
AU - Moore, Jason H.
AU - Orzechowski, Patryk
AU - Olson, Randal S.
AU - Cairns, Junmei
AU - Caraballo, Pedro J.
AU - Weinshilboum, Richard M.
AU - Wang, Liewei
AU - Breitenstein, Matthew K.
N1 - Funding Information:
Acknowledgments. This research was made possible with generous support from the Mayo Clinic, Center for Individualized Medicine. The Institute for Biomedical Informatics, University of Pennsylvania and Mayo Clinic Cancer Genetic Epidemiology Training Program (R25 CA092049) further supported this research.
Publisher Copyright:
© 2017 The Authors.
PY - 2018
Y1 - 2018
N2 - With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency – evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.
AB - With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency – evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.
KW - Automated machine learning
KW - Biobank
KW - Clinical metabolic profiling
KW - Confounding
KW - Homocysteine
KW - Metabolomics
KW - Metformin
KW - Pharmacometabolomics
KW - Precision medicine
UR - http://www.scopus.com/inward/record.url?scp=85048503369&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048503369&partnerID=8YFLogxK
U2 - 10.1142/9789813235533_0042
DO - 10.1142/9789813235533_0042
M3 - Conference article
C2 - 29218905
AN - SCOPUS:85048503369
SN - 2335-6928
VL - 0
SP - 460
EP - 471
JO - Pacific Symposium on Biocomputing
JF - Pacific Symposium on Biocomputing
IS - 212669
Y2 - 3 January 2018 through 7 January 2018
ER -