Multiple Testing of One-Sided Hypotheses under Unknown Dependence

Seonghun Cho, Youngrae Kim, Johan Lim, Hyungwon Choi, DoHwan Park and Woncheol Jang

doi:10.5705/ss.202024.0022

Abstract

The one-sided hypotheses in a multiple testing problem make the empirical null distribution

(or p-values) conservative. Furthermore, it introduces a significant loss of power if not appropriately

considered. We propose a multiple testing procedure named discarding adaptively with bounding on

principal factor approximation (DAB-PFA) to simultaneously test a number of one-sided hypotheses

under the general dependency of test statistics. Specifically, we use the principal factor approximation

(PFA) by Fan and Han (2017) to account for the dependence structure among test statistics and

adaptively discard small or large p-values when estimating the realized false discovery proportion

(FDP). We derive the convergence rate of the proposed estimator and numerically compare the false

discovery rate (FDR) and the true positive rate (TPR) of our method to many existing procedures,

including those from Benjamini and Hochberg (1995), Efron (2004), and Wang and Fan (2017). We

demonstrate our method through simulation studies and analysis of protein phosphorylation levels for

serous ovarian adenocarcinoma samples.

Key words and phrases: Discarding adaptively with bounding (DAB), Principal Factor Approximation, Conservative null, False discovery rate, Multiple testing, One-sided hypothesis *Corresponding author 1

Information

Preprint No.	SS-2024-0022
Manuscript ID	SS-2024-0022
Complete Authors	Seonghun Cho, Youngrae Kim, Johan Lim, Hyungwon Choi, DoHwan Park, Woncheol Jang
Corresponding Authors	Woncheol Jang
Emails	wcjang@snu.ac.kr

References

Ahn, S. C. and A. R. Horenstein (2013). Eigenvalue ratio test for the number of factors. Econometrica 81(3), 1203–1227.
Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57(1), 289–300.
Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29(4), 1165–1188.
Bickel, P. J. and E. Levina (2008a). Covariance regularization by thresholding. The Annals of Statistics 36(6), 2577–2604.
Bickel, P. J. and E. Levina (2008b). Regularized estimation of large covariance matrices. The Annals of Statistics 36(1), 199–227.
Cai, T. T. and W. Liu (2011). Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association 106(494), 672–684.
Cohen, A. and H. B. Sackrowitz (2005). Decision theory results for one-sided multiple comparison procedures. Annals of Statistics 33(1), 126–144.
Corwin, T., J. Woodsmith, F. Apelt, J.-F. Fontaine, D. Meierhofer, J. Helmuth, A. Grossmann, M. A. AndradeNavarro, B. A. Ballif, and U. Stelzl (2017). Defining human tyrosine kinase phosphorylation networks using yeast as an in vivo model substrate. Cell Systems 5(2), 128–139.e4.
Dobriban, E. (2020). Permutation methods for factor analysis and PCA. The Annals of Statistics 48(5), 2824–2847.
Efron, B. (2004). Large-scale simultaneous hypothesis testing. Journal of the American Statistical Association 99(465), 96–104.
Efron, B. (2007). Correlation and large-scale simultaneous significance testing. Journal of the American Statistical Association 102(477), 93–103.
Fan, J. and X. Han (2017). Estimation of the false discovery proportion with unknown dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(4), 1143–1164.
Fan, J., X. Han, and W. Gu (2012). Estimating false discovery proportion under arbitrary covariance dependence. Journal of the American Statistical Association 107(499), 1019–1035.
Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(4), 603–680. Statistics 30(1), 220–238.
Genovese, C. and L. Wasserman (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 499–517.
Hornbeck, P. V., B. Zhang, B. Murray, J. M. Kornhauser, V. Latham, and E. Skrzypek (2015). PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Research 43(D1), D512–D520.
Hu, J., H.-S. Rho, R. H. Newman, J. Zhang, H. Zhu, and J. Qian (2014). PhosphoNetworks: a database for human phosphorylation networks. Bioinformatics 30(1), 141–142.
Liu, J., C. Zhang, and D. Page (2016). Multiple testing under dependence via graphical models. The Annals of Applied Statistics 10(3), 1699–1724.
Owen, A. B. (2005). Variance of the number of false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(3), 411–426.
Ramdas, A., T. Zrnic, M. J. Wainwright, and M. Jordan (2018). SAFFRON: an adaptive algorithm for online control of the false discovery rate. In J. Dy and A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, Volume 80 of Proceedings of Machine Learning Research, pp. 4286–4294. PMLR.
Romano, J. P., A. M. Shaikh, and M. Wolf (2008). Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST 17(3), 417–442.
Sarkar, S. K. (2004). FDR-controlling stepwise procedures and their false negatives rates. Journal of Statistical Planning and Inference 125(1), 119–137.
Sarkar, S. K. (2006). False discovery and false nondiscovery rates in single-step multiple testing procedures. The Annals of Statistics 34(1), 394–415.
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 479–498. Series B (Statistical Methodology) 71(2), 393–424.
Tian, J. and A. Ramdas (2019). ADDIS: An adaptive discarding algorithm for online FDR control with conservative nulls. In Proceedings of the 33rd Conference on Neural Information Processing Systems, pp. 9383–9391. NeurIPS.
Wang, W. and J. Fan (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. The Annals of Statistics 45(3), 1342–1374.
Wei, Z., W. Sun, K. Wang, and H. Hakonarson (2009). Multiple testing in genome-wide association studies via hidden Markov models. Bioinformatics 25(21), 2802–2808.
Wu, W. B. (2008). On false discovery control under dependence. The Annals of Statistics 36(1), 364–380.
Xiao, J., W. Zhu, and J. Guo (2013). Large-scale multiple testing in genome-wide association studies via regionspecific hidden markov models. BMC Bioinformatics 14(1), 282.
Zhang, H., T. Liu, Z. Zhang, S. H. Payne, B. Zhang, J. E. McDermott, J.-Y. Zhou, V. A. Petyuk, L. Chen, D. Ray, S. Sun, F. Yang, L. Chen, J. Wang, P. Shah, S. W. Cha, P. Aiyetan, S. Woo, Y. Tian, M. A. Gritsenko, T. R.
Clauss, C. Choi, M. E. Monroe, S. Thomas, S. Nie, C. Wu, R. J. Moore, K.-H. Yu, D. L. Tabb, D. Feny¨o, V. Bafna, Y. Wang, H. Rodriguez, E. S. Boja, T. Hiltke, R. C. Rivers, L. Sokoll, H. Zhu, I.-M. Shih, L. Cope, A. Pandey, B. Zhang, M. P. Snyder, D. A. Levine, R. D. Smith, D. W. Chan, K. D. Rodland, S. A. Carr, M. A.
Gillette, K. R. Klauser, E. Kuhn, D. Mani, P. Mertins, K. A. Ketchum, R. Thangudu, S. Cai, M. Oberti, A. G.
Paulovich, J. R. Whiteaker, N. J. Edwards, P. B. McGarvey, S. Madhavan, P. Wang, D. W. Chan, A. Pandey, I.-M. Shih, H. Zhang, Z. Zhang, H. Zhu, L. Cope, G. A. Whiteley, S. J. Skates, F. M. White, D. A. Levine, E. S.
Boja, C. R. Kinsinger, T. Hiltke, M. Mesri, R. C. Rivers, H. Rodriguez, K. M. Shaw, S. E. Stein, D. Fenyo, T. Liu, J. E. McDermott, S. H. Payne, K. D. Rodland, R. D. Smith, P. Rudnick, M. Snyder, Y. Zhao, X. Chen, D. F. Ransohoff, A. N. Hoofnagle, D. C. Liebler, M. E. Sanders, Z. Shi, R. J. Slebos, D. L. Tabb, B. Zhang, L. J.
Zimmerman, Y. Wang, S. R. Davies, L. Ding, M. J. Ellis, and R. R. Townsend (2016). Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166(3), 755–765.
Zhao, Q., D. S. Small, and W. Su (2019). Multiple testing when many p-values are uniformly conservative, with application to testing qualitative interaction in educational interventions. Journal of the American Statistical Association 114(527), 1291–1304. Seonghun Cho (Inha University)

Acknowledgments

The authors thank the co-editor, an associate editor, and three reviewers for their constructive suggestions and comments, which led to substantial improvements in the paper. Jang’s

research is supported by a National Research Foundation of Korea (NRF) grant funded by

the Korean government (MSIT) (No. 0769-20240034). Cho’s research is supported by an

INHA University Research Grant. Lim’s research is supported by the National Research

Foundation of Korea (No. NRF-2021R1A2C1010786) and the Brain Pool Program, which is

also funded by the National Research Foundation of Korea and the Ministry of Education

Supplementary Materials

The online Supplementary Material contains proofs of the main theorems, details of simulation results and the heatmaps from the case study.

Supplementary materials are available for download.

[1] Ahn, S. C. and A. R. Horenstein (2013). Eigenvalue ratio test for the number of factors. Econometrica 81(3), 1203–1227.

[2] Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57(1), 289–300.

[3] Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 29(4), 1165–1188.

[4] Bickel, P. J. and E. Levina (2008a). Covariance regularization by thresholding. The Annals of Statistics 36(6), 2577–2604.

[5] Bickel, P. J. and E. Levina (2008b). Regularized estimation of large covariance matrices. The Annals of Statistics 36(1), 199–227.

[6] Cai, T. T. and W. Liu (2011). Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association 106(494), 672–684.

[7] Cohen, A. and H. B. Sackrowitz (2005). Decision theory results for one-sided multiple comparison procedures. Annals of Statistics 33(1), 126–144.

[8] Corwin, T., J. Woodsmith, F. Apelt, J.-F. Fontaine, D. Meierhofer, J. Helmuth, A. Grossmann, M. A. AndradeNavarro, B. A. Ballif, and U. Stelzl (2017). Defining human tyrosine kinase phosphorylation networks using yeast as an in vivo model substrate. Cell Systems 5(2), 128–139.e4.

[9] Dobriban, E. (2020). Permutation methods for factor analysis and PCA. The Annals of Statistics 48(5), 2824–2847.

[10] Efron, B. (2004). Large-scale simultaneous hypothesis testing. Journal of the American Statistical Association 99(465), 96–104.

[11] Efron, B. (2007). Correlation and large-scale simultaneous significance testing. Journal of the American Statistical Association 102(477), 93–103.

[12] Fan, J. and X. Han (2017). Estimation of the false discovery proportion with unknown dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(4), 1143–1164.

[13] Fan, J., X. Han, and W. Gu (2012). Estimating false discovery proportion under arbitrary covariance dependence. Journal of the American Statistical Association 107(499), 1019–1035.

[14] Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(4), 603–680. Statistics 30(1), 220–238.

[15] Genovese, C. and L. Wasserman (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 499–517.

[16] Hornbeck, P. V., B. Zhang, B. Murray, J. M. Kornhauser, V. Latham, and E. Skrzypek (2015). PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Research 43(D1), D512–D520.

[17] Hu, J., H.-S. Rho, R. H. Newman, J. Zhang, H. Zhu, and J. Qian (2014). PhosphoNetworks: a database for human phosphorylation networks. Bioinformatics 30(1), 141–142.

[18] Liu, J., C. Zhang, and D. Page (2016). Multiple testing under dependence via graphical models. The Annals of Applied Statistics 10(3), 1699–1724.

[19] Owen, A. B. (2005). Variance of the number of false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(3), 411–426.

[20] Ramdas, A., T. Zrnic, M. J. Wainwright, and M. Jordan (2018). SAFFRON: an adaptive algorithm for online control of the false discovery rate. In J. Dy and A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, Volume 80 of Proceedings of Machine Learning Research, pp. 4286–4294. PMLR.

[21] Romano, J. P., A. M. Shaikh, and M. Wolf (2008). Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST 17(3), 417–442.

[22] Sarkar, S. K. (2004). FDR-controlling stepwise procedures and their false negatives rates. Journal of Statistical Planning and Inference 125(1), 119–137.

[23] Sarkar, S. K. (2006). False discovery and false nondiscovery rates in single-step multiple testing procedures. The Annals of Statistics 34(1), 394–415.

[24] Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 479–498. Series B (Statistical Methodology) 71(2), 393–424.

[25] Tian, J. and A. Ramdas (2019). ADDIS: An adaptive discarding algorithm for online FDR control with conservative nulls. In Proceedings of the 33rd Conference on Neural Information Processing Systems, pp. 9383–9391. NeurIPS.

[26] Wang, W. and J. Fan (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. The Annals of Statistics 45(3), 1342–1374.

[27] Wei, Z., W. Sun, K. Wang, and H. Hakonarson (2009). Multiple testing in genome-wide association studies via hidden Markov models. Bioinformatics 25(21), 2802–2808.

[28] Wu, W. B. (2008). On false discovery control under dependence. The Annals of Statistics 36(1), 364–380.

[29] Xiao, J., W. Zhu, and J. Guo (2013). Large-scale multiple testing in genome-wide association studies via regionspecific hidden markov models. BMC Bioinformatics 14(1), 282.

[30] Zhang, H., T. Liu, Z. Zhang, S. H. Payne, B. Zhang, J. E. McDermott, J.-Y. Zhou, V. A. Petyuk, L. Chen, D. Ray, S. Sun, F. Yang, L. Chen, J. Wang, P. Shah, S. W. Cha, P. Aiyetan, S. Woo, Y. Tian, M. A. Gritsenko, T. R.

[31] Clauss, C. Choi, M. E. Monroe, S. Thomas, S. Nie, C. Wu, R. J. Moore, K.-H. Yu, D. L. Tabb, D. Feny¨o, V. Bafna, Y. Wang, H. Rodriguez, E. S. Boja, T. Hiltke, R. C. Rivers, L. Sokoll, H. Zhu, I.-M. Shih, L. Cope, A. Pandey, B. Zhang, M. P. Snyder, D. A. Levine, R. D. Smith, D. W. Chan, K. D. Rodland, S. A. Carr, M. A.

[32] Gillette, K. R. Klauser, E. Kuhn, D. Mani, P. Mertins, K. A. Ketchum, R. Thangudu, S. Cai, M. Oberti, A. G.

[33] Paulovich, J. R. Whiteaker, N. J. Edwards, P. B. McGarvey, S. Madhavan, P. Wang, D. W. Chan, A. Pandey, I.-M. Shih, H. Zhang, Z. Zhang, H. Zhu, L. Cope, G. A. Whiteley, S. J. Skates, F. M. White, D. A. Levine, E. S.

[34] Boja, C. R. Kinsinger, T. Hiltke, M. Mesri, R. C. Rivers, H. Rodriguez, K. M. Shaw, S. E. Stein, D. Fenyo, T. Liu, J. E. McDermott, S. H. Payne, K. D. Rodland, R. D. Smith, P. Rudnick, M. Snyder, Y. Zhao, X. Chen, D. F. Ransohoff, A. N. Hoofnagle, D. C. Liebler, M. E. Sanders, Z. Shi, R. J. Slebos, D. L. Tabb, B. Zhang, L. J.

[35] Zimmerman, Y. Wang, S. R. Davies, L. Ding, M. J. Ellis, and R. R. Townsend (2016). Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166(3), 755–765.

[36] Zhao, Q., D. S. Small, and W. Su (2019). Multiple testing when many p-values are uniformly conservative, with application to testing qualitative interaction in educational interventions. Journal of the American Statistical Association 114(527), 1291–1304. Seonghun Cho (Inha University)