The effect of sample size on polygenic hazard models for prostate cancer

Australian Prostate Cancer BioResource (APCB); The PRACTICAL Consortium

doi:10.1038/s41431-020-0664-2

The effect of sample size on polygenic hazard models for prostate cancer

Australian Prostate Cancer BioResource (APCB), The PRACTICAL Consortium

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR_98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR_98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR_98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.

Original language	English (US)
Pages (from-to)	1467-1475
Number of pages	9
Journal	European Journal of Human Genetics
Volume	28
Issue number	10
DOIs	https://doi.org/10.1038/s41431-020-0664-2
State	Published - Oct 1 2020

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1038/s41431-020-0664-2

Cite this

@article{9158fd743393499686dc055334ed9d22,

title = "The effect of sample size on polygenic hazard models for prostate cancer",

abstract = "We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.",

author = "{Australian Prostate Cancer BioResource (APCB)} and {The PRACTICAL Consortium} and Karunamuni, {Roshan A.} and Huynh-Le, {Minh Phuong} and Fan, {Chun C.} and Eeles, {Rosalind A.} and Easton, {Douglas F.} and Kote-Jarai, {ZSofia S.} and {Amin Al Olama}, Ali and {Benlloch Garcia}, Sara and Kenneth Muir and Henrik Gronberg and Fredrik Wiklund and Markus Aly and Johanna Schleutker and Csilla Sipeky and Tammela, {Teuvo L.J.} and Nordestgaard, {B{\o}rge G.} and Key, {Tim J.} and Travis, {Ruth C.} and Neal, {David E.} and Donovan, {Jenny L.} and Hamdy, {Freddie C.} and Paul Pharoah and Nora Pashayan and Khaw, {Kay Tee} and Thibodeau, {Stephen N.} and McDonnell, {Shannon K.} and Schaid, {Daniel J.} and Christiane Maier and Walther Vogel and Manuel Luedeke and Kathleen Herkommer and Kibel, {Adam S.} and Cezary Cybulski and Dominika Wokolorczyk and Wojciech Kluzniak and Lisa Cannon-Albright and Hermann Brenner and Ben Sch{\"o}ttker and Bernd Holleczek and Park, {Jong Y.} and Sellers, {Thomas A.} and Lin, {Hui Yi} and Chavdar Slavov and Radka Kaneva and Vanio Mitev and Jyotsna Batra and Clements, {Judith A.} and Amanda Spurdle and Teixeira, {Manuel R.} and Paula Paulo",

note = "Publisher Copyright: {\textcopyright} 2020, The Author(s), under exclusive licence to European Society of Human Genetics.",

year = "2020",

month = oct,

day = "1",

doi = "10.1038/s41431-020-0664-2",

language = "English (US)",

volume = "28",

pages = "1467--1475",

journal = "European Journal of Human Genetics",

issn = "1018-4813",

publisher = "Nature Publishing Group",

number = "10",

}

TY - JOUR

T1 - The effect of sample size on polygenic hazard models for prostate cancer

AU - Australian Prostate Cancer BioResource (APCB)

AU - The PRACTICAL Consortium

AU - Karunamuni, Roshan A.

AU - Huynh-Le, Minh Phuong

AU - Fan, Chun C.

AU - Eeles, Rosalind A.

AU - Easton, Douglas F.

AU - Kote-Jarai, ZSofia S.

AU - Amin Al Olama, Ali

AU - Benlloch Garcia, Sara

AU - Muir, Kenneth

AU - Gronberg, Henrik

AU - Wiklund, Fredrik

AU - Aly, Markus

AU - Schleutker, Johanna

AU - Sipeky, Csilla

AU - Tammela, Teuvo L.J.

AU - Nordestgaard, Børge G.

AU - Key, Tim J.

AU - Travis, Ruth C.

AU - Neal, David E.

AU - Donovan, Jenny L.

AU - Hamdy, Freddie C.

AU - Pharoah, Paul

AU - Pashayan, Nora

AU - Khaw, Kay Tee

AU - Thibodeau, Stephen N.

AU - McDonnell, Shannon K.

AU - Schaid, Daniel J.

AU - Maier, Christiane

AU - Vogel, Walther

AU - Luedeke, Manuel

AU - Herkommer, Kathleen

AU - Kibel, Adam S.

AU - Cybulski, Cezary

AU - Wokolorczyk, Dominika

AU - Kluzniak, Wojciech

AU - Cannon-Albright, Lisa

AU - Brenner, Hermann

AU - Schöttker, Ben

AU - Holleczek, Bernd

AU - Park, Jong Y.

AU - Sellers, Thomas A.

AU - Lin, Hui Yi

AU - Slavov, Chavdar

AU - Kaneva, Radka

AU - Mitev, Vanio

AU - Batra, Jyotsna

AU - Clements, Judith A.

AU - Spurdle, Amanda

AU - Teixeira, Manuel R.

AU - Paulo, Paula

PY - 2020/10/1

Y1 - 2020/10/1

N2 - We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.

AB - We determined the effect of sample size on performance of polygenic hazard score (PHS) models in prostate cancer. Age and genotypes were obtained for 40,861 men from the PRACTICAL consortium. The dataset included 201,590 SNPs per subject, and was split into training and testing sets. Established-SNP models considered 65 SNPs that had been previously associated with prostate cancer. Discovery-SNP models used stepwise selection to identify new SNPs. The performance of each PHS model was calculated for random sizes of the training set. The performance of a representative Established-SNP model was estimated for random sizes of the testing set. Mean HR98/50 (hazard ratio of top 2% to average in test set) of the Established-SNP model increased from 1.73 [95% CI: 1.69–1.77] to 2.41 [2.40–2.43] when the number of training samples was increased from 1 thousand to 30 thousand. Corresponding HR98/50 of the Discovery-SNP model increased from 1.05 [0.93–1.18] to 2.19 [2.16–2.23]. HR98/50 of a representative Established-SNP model using testing set sample sizes of 0.6 thousand and 6 thousand observations were 1.78 [1.70–1.85] and 1.73 [1.71–1.76], respectively. We estimate that a study population of 20 thousand men is required to develop Discovery-SNP PHS models while 10 thousand men should be sufficient for Established-SNP models.

UR - http://www.scopus.com/inward/record.url?scp=85086170171&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85086170171&partnerID=8YFLogxK

U2 - 10.1038/s41431-020-0664-2

DO - 10.1038/s41431-020-0664-2

M3 - Article

C2 - 32514134

AN - SCOPUS:85086170171

SN - 1018-4813

VL - 28

SP - 1467

EP - 1475

JO - European Journal of Human Genetics

JF - European Journal of Human Genetics

IS - 10

ER -

The effect of sample size on polygenic hazard models for prostate cancer

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this