TY - JOUR
T1 - The effect of retrospective sampling on estimates of prediction error for multifactor dimensionality reduction
AU - Winham, Stacey J.
AU - Motsinger-Reif, Alison A.
PY - 2011/1
Y1 - 2011/1
N2 - The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.
AB - The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.
KW - Bias
KW - Epistasis
KW - Gene-gene interaction
KW - Prediction error
KW - Retrospective and prospective sampling
KW - Variance
UR - http://www.scopus.com/inward/record.url?scp=78650150426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650150426&partnerID=8YFLogxK
U2 - 10.1111/j.1469-1809.2010.00587.x
DO - 10.1111/j.1469-1809.2010.00587.x
M3 - Article
C2 - 20560921
AN - SCOPUS:78650150426
SN - 0003-4800
VL - 75
SP - 46
EP - 61
JO - Annals of Human Genetics
JF - Annals of Human Genetics
IS - 1
ER -