Super learner

Mark J. Van Der Laan; Eric C. Polley; Alan E. Hubbard

doi:10.2202/1544-6115.1309

Super learner

Mark J. Van Der Laan, Eric C. Polley, Alan E. Hubbard

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

574 Scopus citations

Abstract

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

Original language	English (US)
Article number	25
Journal	Statistical Applications in Genetics and Molecular Biology
Volume	6
Issue number	1
DOIs	https://doi.org/10.2202/1544-6115.1309
State	Published - Sep 16 2007

Keywords

Cross-validation
Loss-based estimation
Machine learning
Prediction

ASJC Scopus subject areas

Statistics and Probability
Molecular Biology
Genetics
Computational Mathematics

Access to Document

10.2202/1544-6115.1309

Cite this

@article{58654da1eb134a08b57359414331916c,

title = "Super learner",

abstract = "When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.",

keywords = "Cross-validation, Loss-based estimation, Machine learning, Prediction",

author = "{Van Der Laan}, {Mark J.} and Polley, {Eric C.} and Hubbard, {Alan E.}",

year = "2007",

month = sep,

day = "16",

doi = "10.2202/1544-6115.1309",

language = "English (US)",

volume = "6",

journal = "Statistical Applications in Genetics and Molecular Biology",

issn = "1544-6115",

publisher = "Berkeley Electronic Press",

number = "1",

}

TY - JOUR

T1 - Super learner

AU - Van Der Laan, Mark J.

AU - Polley, Eric C.

AU - Hubbard, Alan E.

PY - 2007/9/16

Y1 - 2007/9/16

N2 - When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

AB - When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

KW - Cross-validation

KW - Loss-based estimation

KW - Machine learning

KW - Prediction

UR - http://www.scopus.com/inward/record.url?scp=34548705586&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548705586&partnerID=8YFLogxK

U2 - 10.2202/1544-6115.1309

DO - 10.2202/1544-6115.1309

M3 - Article

C2 - 17910531

AN - SCOPUS:34548705586

SN - 1544-6115

VL - 6

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

IS - 1

M1 - 25

ER -

Super learner

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this