Accounting for EGFR Mutations in Epidemiologic Analyses of Non–Small Cell Lung Cancers: Examples Based on the International Lung Cancer Consortium Data

Sabine Schmid, Mei Jiang, M. Catherine Brown, Aline Fares, Miguel Garcia, Joelle Soriano, Mei Dong, Sera Thomas, Takashi Kohno, Leticia Ferro Leal, Nancy Diao, Juntao Xie, Zhichao Wang, David Zaridze, Ivana Holcatova, Jolanta Lissowska, Beata Swi Atkowska, Dana Mates, Milan Savic, Angela S. WenzlaffCurtis C. Harris, Neil E. Caporaso, Hongxia Ma, Guillermo Fernandez-Tardon, Matthew J. Barnett, Gary Goodman, Michael P.A. Davies, Mónica Pérez-Ríos, Fiona Taylor, Eric J. Duell, Ben Schoettker, Hermann Brenner, Angeline Andrew, Angela Cox, Alberto Ruano-Ravina, John K. Field, Loic Le Marchand, Ying Wang, Chu Chen, Adonina Tardon, Sanjay Shete, Matthew B. Schabath, Hongbing Shen, Maria Teresa Landi, Brid M. Ryan, Ann G. Schwartz, Lihong Qi, Lori C. Sakoda, Paul Brennan, Ping Yang, Jie Zhang, David C. Christiani, Rui Manuel Reis, Kouya Shiraishi, Rayjean J. Hung, Wei Xu, Geoffrey Liu

Research output: Contribution to journalArticlepeer-review


Background: Somatic EGFR mutations define a subset of non–small cell lung cancers (NSCLC) that have clinical impact on NSCLC risk and outcome. However, EGFR-mutation-status is often missing in epidemiologic datasets. We developed and tested pragmatic approaches to account for EGFR-mutation-status based on variables commonly included in epidemiologic datasets and evaluated the clinical utility of these approaches. Methods: Through analysis of the International Lung Cancer Consortium (ILCCO) epidemiologic datasets, we developed a regression model for EGFR-status; we then applied a clinical-restriction approach using the optimal cut-point, and a second epidemiologic, multiple imputation approach to ILCCO survival analyses that did and did not account for EGFR-status. Results: Of 35,356 ILCCO patients with NSCLC, EGFR-mutation-status was available in 4,231 patients. A model regressing known EGFR-mutation-status on clinical and demographic variables achieved a concordance index of 0.75 (95% CI, 0.74–0.77) in the training and 0.77 (95% CI, 0.74–0.79) in the testing dataset. At an optimal cut-point of probability-score = 0.335, sensitivity = 69% and specificity = 72.5% for determining EGFR-wildtype status. In both restriction-based and imputation-based regression analyses of the individual roles of BMI on overall survival of patients with NSCLC, similar results were observed between overall and EGFR-mutation-negative cohort analyses of patients of all ancestries. However, our approach identified some differences: EGFR-mutated Asian patients did not incur a survival benefit from being obese, as observed in EGFR-wildtype Asian patients. Conclusions: We introduce a pragmatic method to evaluate the potential impact of EGFR-status on epidemiological analyses of NSCLC. Impact: The proposed method is generalizable in the common occurrence in which EGFR-status data are missing.

Original languageEnglish (US)
Pages (from-to)679-687
Number of pages9
JournalCancer Epidemiology Biomarkers and Prevention
Issue number3
StatePublished - Mar 2022

ASJC Scopus subject areas

  • General Medicine


Dive into the research topics of 'Accounting for EGFR Mutations in Epidemiologic Analyses of Non–Small Cell Lung Cancers: Examples Based on the International Lung Cancer Consortium Data'. Together they form a unique fingerprint.

Cite this