TY - GEN
T1 - Bayesian Change Point Detection for Mixed Data with Missing Values
AU - Murph, Alexander C.
AU - Storlie, Curtis B.
N1 - Funding Information:
Research was supported by National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under award number 2R01EB019403-05.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - When a predictive model is in production, it must be monitored over time to ensure that its performance does not suffer from drift or abrupt changes to data. Typically this is done by evaluating the algorithm's predictions to outcome data and ensuring that the algorithm maintains an acceptable level of accuracy over time. However, it is far preferable to learn about major changes in the input data that could affect the models performance in real-time, long before learning that the performance of the model itself has dropped by monitoring outcome data. Thus, there is large need for robust, real-time monitoring of high dimensional input data over time. Here we consider the problem of change point detection on high-dimensional longitudinal data with mixed variable types and missing values. We do this by fitting an array of Mixture Gaussian Graphical Models to groupings of homogeneous data in time, called regimes, which we model as the observed states of a Markov process with unknown transition probabilities. The primary goal of this model is to identify when there is a regime change, as this indicates a significant change in the input data distribution. To handle the messy nature of real-world data which has mixed continuous/discrete variable types, missing data, etc., we take a Bayesian latent variable approach. This affords us flexibility to handle missing values in a principled manner, while simultaneously providing a way to encode discrete and censored values into a continuous framework. We take this approach a step further by encoding the missingness structure, which allows our model to then detect major changes in the patterns of missingness, in addition to the structure of the data distributions themselves.
AB - When a predictive model is in production, it must be monitored over time to ensure that its performance does not suffer from drift or abrupt changes to data. Typically this is done by evaluating the algorithm's predictions to outcome data and ensuring that the algorithm maintains an acceptable level of accuracy over time. However, it is far preferable to learn about major changes in the input data that could affect the models performance in real-time, long before learning that the performance of the model itself has dropped by monitoring outcome data. Thus, there is large need for robust, real-time monitoring of high dimensional input data over time. Here we consider the problem of change point detection on high-dimensional longitudinal data with mixed variable types and missing values. We do this by fitting an array of Mixture Gaussian Graphical Models to groupings of homogeneous data in time, called regimes, which we model as the observed states of a Markov process with unknown transition probabilities. The primary goal of this model is to identify when there is a regime change, as this indicates a significant change in the input data distribution. To handle the messy nature of real-world data which has mixed continuous/discrete variable types, missing data, etc., we take a Bayesian latent variable approach. This affords us flexibility to handle missing values in a principled manner, while simultaneously providing a way to encode discrete and censored values into a continuous framework. We take this approach a step further by encoding the missingness structure, which allows our model to then detect major changes in the patterns of missingness, in addition to the structure of the data distributions themselves.
KW - Bayesian Latent Variables
KW - Change Points
KW - Gaussian Graphical Models
KW - Gaussian Mixture Models
UR - http://www.scopus.com/inward/record.url?scp=85139026725&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139026725&partnerID=8YFLogxK
U2 - 10.1109/ICHI54592.2022.00087
DO - 10.1109/ICHI54592.2022.00087
M3 - Conference contribution
AN - SCOPUS:85139026725
T3 - Proceedings - 2022 IEEE 10th International Conference on Healthcare Informatics, ICHI 2022
SP - 499
EP - 501
BT - Proceedings - 2022 IEEE 10th International Conference on Healthcare Informatics, ICHI 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE International Conference on Healthcare Informatics, ICHI 2022
Y2 - 11 June 2022 through 14 June 2022
ER -