The effect of retrospective sampling on estimates of prediction error for multifactor dimensionality reduction

Stacey J. Winham, Alison A. Motsinger-Reif

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.

Original languageEnglish (US)
Pages (from-to)46-61
Number of pages16
JournalAnnals of Human Genetics
Volume75
Issue number1
DOIs
StatePublished - Jan 2011

Keywords

  • Bias
  • Epistasis
  • Gene-gene interaction
  • Prediction error
  • Retrospective and prospective sampling
  • Variance

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Fingerprint

Dive into the research topics of 'The effect of retrospective sampling on estimates of prediction error for multifactor dimensionality reduction'. Together they form a unique fingerprint.

Cite this