Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO

Susmita Datta, Jennifer Le-Rademacher, Somnath Datta

Research output: Contribution to journalReview articlepeer-review

54 Scopus citations


We consider the problem of predicting survival times of cancer patients from the gene expression profiles of their tumor samples via linear regression modeling of log-transformed failure times. The partial least squares (PLS) and least absolute shrinkage and selection operator (LASSO) methodologies are used for this purpose where we first modify the data to account for censoring. Three approaches of handling right censored data - reweighting, mean imputation, and multiple imputation - are considered. Their performances are examined in a detailed simulation study and compared with that of full data PLS and LASSO had there been no censoring. A major objective of this article is to investigate the performances of PLS and LASSO in the context of microarray data where the number of covariates is very large and there are extremely few samples. We demonstrate that LASSO outperforms PLS in terms of prediction error when the list of covariates includes a moderate to large percentage of useless or noise variables; otherwise, PLS may outperform LASSO. For a moderate sample size (100 with 10,000 covariates), LASSO performed better than a no covariate model (or noise-based prediction). The mean imputation method appears to best track the performance of the full data PLS or LASSO. The mean imputation scheme is used on an existing data set on lung cancer. This reanalysis using the mean imputed PLS and LASSO identifies a number of genes that were known to be related to cancer or tumor activities from previous studies.

Original languageEnglish (US)
Pages (from-to)259-271
Number of pages13
Issue number1
StatePublished - Mar 2007


  • Cancer
  • Gene expression
  • Partial least squares
  • Right censoring
  • Survival

ASJC Scopus subject areas

  • Statistics and Probability
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics


Dive into the research topics of 'Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO'. Together they form a unique fingerprint.

Cite this