TY - JOUR
T1 - Covariate Adaptive False Discovery Rate Control With Applications to Omics-Wide Multiple Testing
AU - Zhang, Xianyang
AU - Chen, Jun
N1 - Publisher Copyright:
© 2020 American Statistical Association.
PY - 2022
Y1 - 2022
N2 - Conventional multiple testing procedures often assume hypotheses for different features are exchangeable. However, in many scientific applications, additional covariate information regarding the patterns of signals and nulls are available. In this article, we introduce an FDR control procedure in large-scale inference problem that can incorporate covariate information. We develop a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when the underlying likelihood ratio model is misspecified and the p-values are weakly dependent (e.g., strong mixing). Extensive simulations are conducted to study the finite sample performance of the proposed method and we demonstrate that the new approach improves over the state-of-the-art approaches by being flexible, robust, powerful, and computationally efficient. We finally apply the method to several omics datasets arising from genomics studies with the aim to identify omics features associated with some clinical and biological phenotypes. We show that the method is overall the most powerful among competing methods, especially when the signal is sparse. The proposed covariate adaptive multiple testing procedure is implemented in the R package CAMT. Supplementary materials for this article are available online.
AB - Conventional multiple testing procedures often assume hypotheses for different features are exchangeable. However, in many scientific applications, additional covariate information regarding the patterns of signals and nulls are available. In this article, we introduce an FDR control procedure in large-scale inference problem that can incorporate covariate information. We develop a fast algorithm to implement the proposed procedure and prove its asymptotic validity even when the underlying likelihood ratio model is misspecified and the p-values are weakly dependent (e.g., strong mixing). Extensive simulations are conducted to study the finite sample performance of the proposed method and we demonstrate that the new approach improves over the state-of-the-art approaches by being flexible, robust, powerful, and computationally efficient. We finally apply the method to several omics datasets arising from genomics studies with the aim to identify omics features associated with some clinical and biological phenotypes. We show that the method is overall the most powerful among competing methods, especially when the signal is sparse. The proposed covariate adaptive multiple testing procedure is implemented in the R package CAMT. Supplementary materials for this article are available online.
KW - Covariates
KW - EM-algorithm
KW - False discovery rate
KW - Multiple testing
UR - http://www.scopus.com/inward/record.url?scp=85089496722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089496722&partnerID=8YFLogxK
U2 - 10.1080/01621459.2020.1783273
DO - 10.1080/01621459.2020.1783273
M3 - Article
AN - SCOPUS:85089496722
SN - 0162-1459
VL - 117
SP - 411
EP - 427
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 537
ER -