Simultaneous classification and relevant feature identification in high-dimensional spaces: Application to molecular profiling data

C. Bhattacharyya, L. R. Grate, A. Rizki, D. Radisky, F. J. Molina, M. I. Jordan, M. J. Bissell, I. S. Mian

Research output: Contribution to journalArticlepeer-review

50 Scopus citations


Molecular profiling technologies monitor many thousands of transcripts, proteins, metabolites or other species concurrently in a biological sample of interest. Given such high-dimensional data for different types of samples, classification methods aim to assign specimens to known categories. Relevant feature identification methods seek to define a subset of molecules that differentiate the samples. This work describes LIKNON, a specific implementation of a statistical approach for creating a classifier and identifying a small number of relevant features simultaneously. Given two-class data, LIKNON estimates a sparse linear classifier by exploiting the simple and well-known property that minimising an L1 norm (via linear programming) yields a sparse hyperplane. It performs well when used for retrospective analysis of three cancer biology profiling data sets, (i) small, round, blue cell tumour transcript profiles from tumour biopsies and cell lines, (ii) sporadic breast carcinoma transcript profiles from patients with distant metastases <5 years and those with no distant metastases ≥5 years and (iii) serum sample protein profiles from unaffected and ovarian cancer patients. Computationally, LIKNON is less demanding than the prevailing filter-wrapper strategy; this approach generates many feature subsets and equates relevant features with the subset yielding a classifier with the lowest generalisation error. Biologically, the results suggest a role for the cellular microenvironment in influencing disease outcome and its importance in developing clinical decision support systems.

Original languageEnglish (US)
Pages (from-to)729-743
Number of pages15
JournalSignal Processing
Issue number4
StatePublished - Apr 2003


  • Cancer biology
  • Classification
  • Feature selection
  • L norm minimisation
  • Minimax probability machine
  • Molecular profiling data

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering


Dive into the research topics of 'Simultaneous classification and relevant feature identification in high-dimensional spaces: Application to molecular profiling data'. Together they form a unique fingerprint.

Cite this