A Locally Adaptive Algorithm for Multiple Testing with Network Structure

Ziyi Liang, T. Tony Cai, Wenguang Sun and Yin Xia

doi:10.5705/ss.202024.0002

Abstract

Incorporating auxiliary information alongside primary data can significantly

enhance the accuracy of simultaneous inference. However, existing multiple

testing methods face challenges in efficiently incorporating complex side information, especially when it differs in dimension or structure from the pri-

mary data, such as network side information. This paper introduces a locally

adaptive structure learning algorithm (LASLA), a flexible framework designed

to integrate a broad range of auxiliary information into the inference process. Although LASLA is specifically motivated by the challenges posed by

network-structured data, it also proves highly effective with other types of

side information, such as spatial locations and multiple auxiliary sequences.

LASLA employs a p-value weighting approach, leveraging structural insights

to derive data-driven weights that prioritize the importance of different hypotheses. Our theoretical analysis demonstrates that LASLA asymptotically

controls the false discovery rate (FDR) under independent or weakly dependent p-values, and achieves enhanced power in scenarios where the auxiliary

data provides valuable side information. Simulation studies are conducted to

evaluate LASLA’s numerical performance, and its efficacy is further illustrated

through two real-world applications.

Key words and phrases: Covariate-assisted inference, Distance matrix, False discovery rate, p-value weight- ing, Structure Learning

Information

Preprint No.	SS-2024-0002
Manuscript ID	SS-2024-0002
Complete Authors	Ziyi Liang, T. Tony Cai, Wenguang Sun, Yin Xia
Corresponding Authors	Yin Xia
Emails	xiayin@fudan.edu.cn

References

Basu, P., T. T. Cai, K. Das, and W. Sun (2018). Weighted false discovery rate control in large-scale multiple testing. J. Am. Statist. Assoc. 113(523), 1172–1183.
Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. B 57(1), 289–300.
Brenner, L. N. e. a. (2020). Analysis of glucocorticoid-related genes reveal cchcr1 as a new candidate gene for type 2 diabetes. J. Endocr. Soc. 4(11), bvaa121.
Cai, T. T., W. Sun, and W. Wang (2019). CARS: Covariate assisted ranking and screening for large-scale two-sample inference (with discussion). J. Roy. Statist. Soc. B 81, 187–234.
Cai, T. T., W. Sun, and Y. Xia (2022). LAWS: A Locally Adaptive Weighting and Screening Approach to Spatial Multiple Testing. J. Am. Statist. Assoc. 117, 1370–1383.
Castillo, I. and ´E. Roquain (2020). On spike and slab empirical bayes multiple testing. Ann. Statist. 48(5), 2548–2574.
Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96, 1151–1160.
Foster, D. P. and R. A. Stine (2008). α-investing: a procedure for sequential control of expected false discoveries. J. R. Stat. Soc. B 70(2), 429–444.
Fu, L., B. Gang, G. M. James, and W. Sun (2022). Heteroscedasticity-adjusted ranking and thresholding for large-scale multiple testing. J. Am. Statist. Assoc. 117(538), 1028–1040.
Genovese, C. and L. Wasserman (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. B 64, 499–517.
Genovese, C. R., K. Roeder, and L. Wasserman (2006). False discovery control with p-value weighting. Biometrika 93(3), 509–524.
Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep Learning. MIT Press. http://www. deeplearningbook.org.
Heller, R. and S. Rosset (2021). Optimal control of false discovery criteria in the two-group model. J. Roy. Statist. Soc. B 83(1), 133–155.
Hu, J. X., H. Zhao, and H. H. Zhou (2010). False discovery rate control with groups. J. Am. Statist. Assoc. 105, 1215–1227.
Ignatiadis, N. and W. Huber (2021). Covariate powered cross-weighted multiple testing. J. Roy. Statist. Soc. B 83(4), 720–751.
Ignatiadis, N., B. Klaus, J. B. Zaugg, and W. Huber (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods 13(7), 577.
Joiret, M., J. M. Mahachie John, E. S. Gusareva, and K. Van Steen (2019). Confounding of linkage disequilibrium patterns in large scale dna based gene-gene interaction studies. BioData Min. 12(1), 11–33.
Lei, L. and W. Fithian (2018). Adapt: an interactive procedure for multiple testing with side information. J. R. Stat. Soc. B 80(4), 649–679.
Lei, L., A. Ramdas, and W. Fithian (2020). A general interactive framework for false discovery rate control under structural constraints. Biometrika 108(2), 253–267.
Li, A. and R. F. Barber (2019). Multiple testing with the structure-adaptive benjamini–hochberg algorithm. J. R. Stat. Soc. B 81(1), 45–74.
Li, L. and X. Zhang (2017). Parsimonious tensor response regression. Journal of the American Statistical Association 112(519), 1131–1146.
Lynch, G., W. Guo, S. K. Sarkar, H. Finner, et al. (2017). The control of the false discovery rate in fixed sequence multiple testing. Electron. J. Stat. 11(2), 4649–4673.
Pe˜na, E. A., J. D. Habiger, and W. Wu (2011). Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 39(1), 556.
Ramdas, A. K., R. F. Barber, M. J. Wainwright, and M. I. Jordan (2019). A unified treatment of multiple testing with prior knowledge using the p-filter. The Annals of Statistics 47(5), 2790 – 2821.
Ren, Z. and E. Cand`es (2023). Knockoffs with side information. Ann. Appl. Stat. 17(2), 1152–1174.
Roeder, K. and L. Wasserman (2009). Genome-wide significance levels and weighted hypothesis testing. Statistical science: a review journal of the Institute of Mathematical Statistics 24(4), 398–413.
Roquain, E. and M. A. Van De Wiel (2009). Optimal weighting for false discovery rate control. Electron. J. Stat. 3, 678–711.
Schaub, M. A., A. P. Boyle, A. Kundaje, S. Batzoglou, and M. Snyder (2012). Linking disease associations with regulatory information in the human genome. Genome Res. 22(9), 1748–1759.
Spracklen, C., M. Horikoshi, and Y. e. a. Kim (2020). Identification of type 2 diabetes loci in 433,540 east asian individuals. Nature 582, 240–245.
Stein, M. L. (1995). Fixed-domain asymptotics for spatial periodograms. J. Am. Statist. Assoc. 90(432), 1277–1288.
Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31, 2013–2035.
Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102, 901–912.
Sun, W., B. J. Reich, T. T. Cai, M. Guindani, and A. Schwartzman (2015). False discovery control in large-scale spatial multiple testing. J. R. Stat. Soc. B 77(1), 59–83.
Xia, Y., T. T. Cai, and W. Sun (2020). GAP: A General Framework for Information Pooling in Two-Sample Sparse Inference. J. Am. Statist. Assoc. 115, 1236–1250.
Yurko, R., M. G’Sell, K. Roeder, and B. Devlin (2020). A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk. Proceedings of the National Academy of Sciences 117(26), 15028–15035. Ziyi Liang

Acknowledgments

The research of Yin Xia was supported in part by the National Natural Science

Foundation of China (Grant No. 12331009). The research of Tony Cai was supported

in part by the National Science Foundation (Grant DMS-2413106) and the National

Institutes of Health (Grants R01-GM129781 and R01-GM123056).

Supplementary Materials

The Supplementary material includes numerical implementation details, additional

simulations and applications of LASLA, as well as theoretical results for the dependent case and the proofs of all theories.

Supplementary materials are available for download.

[1] Basu, P., T. T. Cai, K. Das, and W. Sun (2018). Weighted false discovery rate control in large-scale multiple testing. J. Am. Statist. Assoc. 113(523), 1172–1183.

[2] Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. B 57(1), 289–300.

[3] Brenner, L. N. e. a. (2020). Analysis of glucocorticoid-related genes reveal cchcr1 as a new candidate gene for type 2 diabetes. J. Endocr. Soc. 4(11), bvaa121.

[4] Cai, T. T., W. Sun, and W. Wang (2019). CARS: Covariate assisted ranking and screening for large-scale two-sample inference (with discussion). J. Roy. Statist. Soc. B 81, 187–234.

[5] Cai, T. T., W. Sun, and Y. Xia (2022). LAWS: A Locally Adaptive Weighting and Screening Approach to Spatial Multiple Testing. J. Am. Statist. Assoc. 117, 1370–1383.

[6] Castillo, I. and ´E. Roquain (2020). On spike and slab empirical bayes multiple testing. Ann. Statist. 48(5), 2548–2574.

[7] Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96, 1151–1160.

[8] Foster, D. P. and R. A. Stine (2008). α-investing: a procedure for sequential control of expected false discoveries. J. R. Stat. Soc. B 70(2), 429–444.

[9] Fu, L., B. Gang, G. M. James, and W. Sun (2022). Heteroscedasticity-adjusted ranking and thresholding for large-scale multiple testing. J. Am. Statist. Assoc. 117(538), 1028–1040.

[10] Genovese, C. and L. Wasserman (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. B 64, 499–517.

[11] Genovese, C. R., K. Roeder, and L. Wasserman (2006). False discovery control with p-value weighting. Biometrika 93(3), 509–524.

[12] Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep Learning. MIT Press. http://www. deeplearningbook.org.

[13] Heller, R. and S. Rosset (2021). Optimal control of false discovery criteria in the two-group model. J. Roy. Statist. Soc. B 83(1), 133–155.

[14] Hu, J. X., H. Zhao, and H. H. Zhou (2010). False discovery rate control with groups. J. Am. Statist. Assoc. 105, 1215–1227.

[15] Ignatiadis, N. and W. Huber (2021). Covariate powered cross-weighted multiple testing. J. Roy. Statist. Soc. B 83(4), 720–751.

[16] Ignatiadis, N., B. Klaus, J. B. Zaugg, and W. Huber (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods 13(7), 577.

[17] Joiret, M., J. M. Mahachie John, E. S. Gusareva, and K. Van Steen (2019). Confounding of linkage disequilibrium patterns in large scale dna based gene-gene interaction studies. BioData Min. 12(1), 11–33.

[18] Lei, L. and W. Fithian (2018). Adapt: an interactive procedure for multiple testing with side information. J. R. Stat. Soc. B 80(4), 649–679.

[19] Lei, L., A. Ramdas, and W. Fithian (2020). A general interactive framework for false discovery rate control under structural constraints. Biometrika 108(2), 253–267.

[20] Li, A. and R. F. Barber (2019). Multiple testing with the structure-adaptive benjamini–hochberg algorithm. J. R. Stat. Soc. B 81(1), 45–74.

[21] Li, L. and X. Zhang (2017). Parsimonious tensor response regression. Journal of the American Statistical Association 112(519), 1131–1146.

[22] Lynch, G., W. Guo, S. K. Sarkar, H. Finner, et al. (2017). The control of the false discovery rate in fixed sequence multiple testing. Electron. J. Stat. 11(2), 4649–4673.

[23] Pe˜na, E. A., J. D. Habiger, and W. Wu (2011). Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 39(1), 556.

[24] Ramdas, A. K., R. F. Barber, M. J. Wainwright, and M. I. Jordan (2019). A unified treatment of multiple testing with prior knowledge using the p-filter. The Annals of Statistics 47(5), 2790 – 2821.

[25] Ren, Z. and E. Cand`es (2023). Knockoffs with side information. Ann. Appl. Stat. 17(2), 1152–1174.

[26] Roeder, K. and L. Wasserman (2009). Genome-wide significance levels and weighted hypothesis testing. Statistical science: a review journal of the Institute of Mathematical Statistics 24(4), 398–413.

[27] Roquain, E. and M. A. Van De Wiel (2009). Optimal weighting for false discovery rate control. Electron. J. Stat. 3, 678–711.

[28] Schaub, M. A., A. P. Boyle, A. Kundaje, S. Batzoglou, and M. Snyder (2012). Linking disease associations with regulatory information in the human genome. Genome Res. 22(9), 1748–1759.

[29] Spracklen, C., M. Horikoshi, and Y. e. a. Kim (2020). Identification of type 2 diabetes loci in 433,540 east asian individuals. Nature 582, 240–245.

[30] Stein, M. L. (1995). Fixed-domain asymptotics for spatial periodograms. J. Am. Statist. Assoc. 90(432), 1277–1288.

[31] Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31, 2013–2035.

[32] Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102, 901–912.

[33] Sun, W., B. J. Reich, T. T. Cai, M. Guindani, and A. Schwartzman (2015). False discovery control in large-scale spatial multiple testing. J. R. Stat. Soc. B 77(1), 59–83.

[34] Xia, Y., T. T. Cai, and W. Sun (2020). GAP: A General Framework for Information Pooling in Two-Sample Sparse Inference. J. Am. Statist. Assoc. 115, 1236–1250.

[35] Yurko, R., M. G’Sell, K. Roeder, and B. Devlin (2020). A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk. Proceedings of the National Academy of Sciences 117(26), 15028–15035. Ziyi Liang