Abstract
Incorporating auxiliary information alongside primary data can significantly
enhance the accuracy of simultaneous inference. However, existing multiple
testing methods face challenges in efficiently incorporating complex side information, especially when it differs in dimension or structure from the pri-
mary data, such as network side information. This paper introduces a locally
adaptive structure learning algorithm (LASLA), a flexible framework designed
to integrate a broad range of auxiliary information into the inference process. Although LASLA is specifically motivated by the challenges posed by
network-structured data, it also proves highly effective with other types of
side information, such as spatial locations and multiple auxiliary sequences.
LASLA employs a p-value weighting approach, leveraging structural insights
to derive data-driven weights that prioritize the importance of different hypotheses. Our theoretical analysis demonstrates that LASLA asymptotically
controls the false discovery rate (FDR) under independent or weakly dependent p-values, and achieves enhanced power in scenarios where the auxiliary
data provides valuable side information. Simulation studies are conducted to
evaluate LASLA’s numerical performance, and its efficacy is further illustrated
through two real-world applications.
Information
| Preprint No. | SS-2024-0002 |
|---|---|
| Manuscript ID | SS-2024-0002 |
| Complete Authors | Ziyi Liang, T. Tony Cai, Wenguang Sun, Yin Xia |
| Corresponding Authors | Yin Xia |
| Emails | xiayin@fudan.edu.cn |
References
- Basu, P., T. T. Cai, K. Das, and W. Sun (2018). Weighted false discovery rate control in large-scale multiple testing. J. Am. Statist. Assoc. 113(523), 1172–1183.
- Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. B 57(1), 289–300.
- Brenner, L. N. e. a. (2020). Analysis of glucocorticoid-related genes reveal cchcr1 as a new candidate gene for type 2 diabetes. J. Endocr. Soc. 4(11), bvaa121.
- Cai, T. T., W. Sun, and W. Wang (2019). CARS: Covariate assisted ranking and screening for large-scale two-sample inference (with discussion). J. Roy. Statist. Soc. B 81, 187–234.
- Cai, T. T., W. Sun, and Y. Xia (2022). LAWS: A Locally Adaptive Weighting and Screening Approach to Spatial Multiple Testing. J. Am. Statist. Assoc. 117, 1370–1383.
- Castillo, I. and ´E. Roquain (2020). On spike and slab empirical bayes multiple testing. Ann. Statist. 48(5), 2548–2574.
- Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96, 1151–1160.
- Foster, D. P. and R. A. Stine (2008). α-investing: a procedure for sequential control of expected false discoveries. J. R. Stat. Soc. B 70(2), 429–444.
- Fu, L., B. Gang, G. M. James, and W. Sun (2022). Heteroscedasticity-adjusted ranking and thresholding for large-scale multiple testing. J. Am. Statist. Assoc. 117(538), 1028–1040.
- Genovese, C. and L. Wasserman (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. B 64, 499–517.
- Genovese, C. R., K. Roeder, and L. Wasserman (2006). False discovery control with p-value weighting. Biometrika 93(3), 509–524.
- Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep Learning. MIT Press. http://www. deeplearningbook.org.
- Heller, R. and S. Rosset (2021). Optimal control of false discovery criteria in the two-group model. J. Roy. Statist. Soc. B 83(1), 133–155.
- Hu, J. X., H. Zhao, and H. H. Zhou (2010). False discovery rate control with groups. J. Am. Statist. Assoc. 105, 1215–1227.
- Ignatiadis, N. and W. Huber (2021). Covariate powered cross-weighted multiple testing. J. Roy. Statist. Soc. B 83(4), 720–751.
- Ignatiadis, N., B. Klaus, J. B. Zaugg, and W. Huber (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods 13(7), 577.
- Joiret, M., J. M. Mahachie John, E. S. Gusareva, and K. Van Steen (2019). Confounding of linkage disequilibrium patterns in large scale dna based gene-gene interaction studies. BioData Min. 12(1), 11–33.
- Lei, L. and W. Fithian (2018). Adapt: an interactive procedure for multiple testing with side information. J. R. Stat. Soc. B 80(4), 649–679.
- Lei, L., A. Ramdas, and W. Fithian (2020). A general interactive framework for false discovery rate control under structural constraints. Biometrika 108(2), 253–267.
- Li, A. and R. F. Barber (2019). Multiple testing with the structure-adaptive benjamini–hochberg algorithm. J. R. Stat. Soc. B 81(1), 45–74.
- Li, L. and X. Zhang (2017). Parsimonious tensor response regression. Journal of the American Statistical Association 112(519), 1131–1146.
- Lynch, G., W. Guo, S. K. Sarkar, H. Finner, et al. (2017). The control of the false discovery rate in fixed sequence multiple testing. Electron. J. Stat. 11(2), 4649–4673.
- Pe˜na, E. A., J. D. Habiger, and W. Wu (2011). Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 39(1), 556.
- Ramdas, A. K., R. F. Barber, M. J. Wainwright, and M. I. Jordan (2019). A unified treatment of multiple testing with prior knowledge using the p-filter. The Annals of Statistics 47(5), 2790 – 2821.
- Ren, Z. and E. Cand`es (2023). Knockoffs with side information. Ann. Appl. Stat. 17(2), 1152–1174.
- Roeder, K. and L. Wasserman (2009). Genome-wide significance levels and weighted hypothesis testing. Statistical science: a review journal of the Institute of Mathematical Statistics 24(4), 398–413.
- Roquain, E. and M. A. Van De Wiel (2009). Optimal weighting for false discovery rate control. Electron. J. Stat. 3, 678–711.
- Schaub, M. A., A. P. Boyle, A. Kundaje, S. Batzoglou, and M. Snyder (2012). Linking disease associations with regulatory information in the human genome. Genome Res. 22(9), 1748–1759.
- Spracklen, C., M. Horikoshi, and Y. e. a. Kim (2020). Identification of type 2 diabetes loci in 433,540 east asian individuals. Nature 582, 240–245.
- Stein, M. L. (1995). Fixed-domain asymptotics for spatial periodograms. J. Am. Statist. Assoc. 90(432), 1277–1288.
- Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31, 2013–2035.
- Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102, 901–912.
- Sun, W., B. J. Reich, T. T. Cai, M. Guindani, and A. Schwartzman (2015). False discovery control in large-scale spatial multiple testing. J. R. Stat. Soc. B 77(1), 59–83.
- Xia, Y., T. T. Cai, and W. Sun (2020). GAP: A General Framework for Information Pooling in Two-Sample Sparse Inference. J. Am. Statist. Assoc. 115, 1236–1250.
- Yurko, R., M. G’Sell, K. Roeder, and B. Devlin (2020). A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk. Proceedings of the National Academy of Sciences 117(26), 15028–15035. Ziyi Liang
Acknowledgments
The research of Yin Xia was supported in part by the National Natural Science
Foundation of China (Grant No. 12331009). The research of Tony Cai was supported
in part by the National Science Foundation (Grant DMS-2413106) and the National
Institutes of Health (Grants R01-GM129781 and R01-GM123056).
Supplementary Materials
The Supplementary material includes numerical implementation details, additional
simulations and applications of LASLA, as well as theoretical results for the dependent case and the proofs of all theories.