Prediction and Inference With Missing Data in Patient Alert Systems

Curtis B. Storlie; Terry M. Therneau; Rickey E. Carter; Nicholas Chia; John R. Bergquist; Jeanne M. Huddleston; Santiago Romero-Brufau

doi:10.1080/01621459.2019.1604359

Prediction and Inference With Missing Data in Patient Alert Systems

Curtis B. Storlie, Terry M. Therneau, Rickey E. Carter, Nicholas Chia, John R. Bergquist, Jeanne M. Huddleston, Santiago Romero-Brufau

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-intensive care unit patients using ∼100 variables (vitals, lab results, assessments, etc.). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Original language	English (US)
Pages (from-to)	32-46
Number of pages	15
Journal	Journal of the American Statistical Association
Volume	115
Issue number	529
DOIs	https://doi.org/10.1080/01621459.2019.1604359
State	Published - Jan 2 2020

Keywords

Continuous and categorical
Dirichlet process
Hierarchical Bayesian model
Latent variable
Missing data
Multiple imputation

ASJC Scopus subject areas

Statistics and Probability
Statistics, Probability and Uncertainty

Access to Document

10.1080/01621459.2019.1604359

Cite this

@article{1c26293d9788431a8c34269b1b8549fd,

title = "Prediction and Inference With Missing Data in Patient Alert Systems",

abstract = "We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-intensive care unit patients using ∼100 variables (vitals, lab results, assessments, etc.). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.",

keywords = "Continuous and categorical, Dirichlet process, Hierarchical Bayesian model, Latent variable, Missing data, Multiple imputation",

author = "Storlie, {Curtis B.} and Therneau, {Terry M.} and Carter, {Rickey E.} and Nicholas Chia and Bergquist, {John R.} and Huddleston, {Jeanne M.} and Santiago Romero-Brufau",

note = "Publisher Copyright: {\textcopyright} 2019, {\textcopyright} 2019 American Statistical Association.",

year = "2020",

month = jan,

day = "2",

doi = "10.1080/01621459.2019.1604359",

language = "English (US)",

volume = "115",

pages = "32--46",

journal = "Journal of the American Statistical Association",

issn = "0162-1459",

publisher = "Taylor and Francis Ltd.",

number = "529",

}

TY - JOUR

T1 - Prediction and Inference With Missing Data in Patient Alert Systems

AU - Storlie, Curtis B.

AU - Therneau, Terry M.

AU - Carter, Rickey E.

AU - Chia, Nicholas

AU - Bergquist, John R.

AU - Huddleston, Jeanne M.

AU - Romero-Brufau, Santiago

PY - 2020/1/2

Y1 - 2020/1/2

N2 - We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-intensive care unit patients using ∼100 variables (vitals, lab results, assessments, etc.). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

AB - We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-intensive care unit patients using ∼100 variables (vitals, lab results, assessments, etc.). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

KW - Continuous and categorical

KW - Dirichlet process

KW - Hierarchical Bayesian model

KW - Latent variable

KW - Missing data

KW - Multiple imputation

UR - http://www.scopus.com/inward/record.url?scp=85067881880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067881880&partnerID=8YFLogxK

U2 - 10.1080/01621459.2019.1604359

DO - 10.1080/01621459.2019.1604359

M3 - Article

AN - SCOPUS:85067881880

SN - 0162-1459

VL - 115

SP - 32

EP - 46

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

IS - 529

ER -

Prediction and Inference With Missing Data in Patient Alert Systems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this