Abstract

Identifying outcome-related variables is of general research interest in biomedical research. This

task can be complicated by the presence of dynamic (or varying) variable effects that often manifest meaningful scientific mechanisms. Appropriately accounting for possible dynamic effects is crucial to avoid

depreciating some important variables. In this work, we propose a model-free testing and screening framework by adopting a global view pertaining to the concept of interval quantile independence. The new

framework not only permits robust identification of variables dynamically associated with an outcome, but

also offers the flexibility to perform group testing that simultaneously evaluates multiple continuous or

discrete covariates. We show that the key testing strategy can naturally evolve into unconditional and conditional screening procedures for ultra-high dimensional settings that enjoys the desirable sure screening

property. We demonstrate good practical utility of the proposed methods via extensive simulation studies

and a real application to a microarray data set.

Information

Preprint No.SS-2023-0285
Manuscript IDSS-2023-0285
Complete AuthorsYing Cui, Limin Peng
Corresponding AuthorsLimin Peng
Emailslpeng@emory.edu

References

  1. Barut, E., J. Fan, and A. Verhasselt (2016). Conditional sure independence screening. Journal of the American Statistical Association 111(515), 1266–1277.
  2. Belloni, A. and V. Chernozhukov (2011). L1-penalized quantile regression in highdimensional sparse models. The Annals of Statistics 39(1), 82–130.
  3. Cui, Y. and L. Peng (2022). Assessing dynamic covariate effects with survival data. Lifetime data analysis 28(4), 675–699.
  4. Efron, B. and R. Tibshirani (2007). On testing the significance of sets of genes. The annals of applied statistics 1(1), 107–129.
  5. Fan, J., Y. Feng, and R. Song (2011). Nonparametric independence screening in sparse
  6. ultra-high-dimensional additive models.
  7. Journal of the American Statistical Association 106(494), 544–557.
  8. Fan, J. and J. Lv (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(5), 849–911.
  9. Fan, J., R. Samworth, and Y. Wu (2009). Ultrahigh dimensional feature selection: beyond the linear model. The Journal of Machine Learning Research 10, 2013–2038.
  10. Fan, J., R. Song, et al. (2010). Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics 38(6), 3567–3604.
  11. Gutenbrunner, C., J. Jureˆckov´a, R. Koenker, and S. Portnoy (1993). Tests of linear hypotheses based on regression rank scores. Journaltitle of Nonparametric Statistics 2(4), 307–331.
  12. Hall, P. and H. Miller (2009). Using generalized correlation to effect variable selection in very high dimensional problems. Journal of Computational and Graphical Statistics 18(3), 533–550.
  13. Hampel, F. R., E. M. Ronchetti, P. Rousseeuw, and W. A. Stahel (1986). Robust statistics: the approach based on influence functions. Wiley-Interscience; New York.
  14. He, X., L. Wang, H. G. Hong, et al. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics 41(1), 342–369.
  15. Koenker, R. and G. Bassett (1978). Regression quantiles. Econometrica 46(1), 33–50.
  16. Koenker, R. and G. Bassett (1982). Robust tests for heteroscedasticity based on regression quantiles. Econometrica 50(1), 43–61.
  17. Li, R. and L. Peng (2014). Varying coefficient subdistribution regression for left-truncated semi-competing risks data. Journal of Multivariate Analysis 131, 65–78.
  18. Li, R. and L. Peng (2017). Assessing quantile prediction with censored quantile regression models. Biometrics 73(2), 517–528.
  19. Lin, D. Y., L.-J. Wei, and Z. Ying (1993). Checking the cox model with cumulative sums of martingale-based residuals. Biometrika 80(3), 557–572.
  20. Liu, W., Y. Ke, J. Liu, and R. Li (2022). Model-free feature screening and fdr control with knockoff features. Journal of the American Statistical Association 117(537), 428–443.
  21. Mai, Q. and H. Zou (2015). The fused kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics 43(4), 1471–1497.
  22. Newton, M. A., F. A. Quintana, J. A. Den Boon, S. Sengupta, and P. Ahlquist (2007). Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. The Annals of Applied Statistics 1(1), 85–106.
  23. Pan, W., X. Wang, W. Xiao, and H. Zhu (2019). A generic sure independence screening procedure. Journal of the American Statistical Association 114(526), 928–937. PMID: 31692981.
  24. Peng, L. and J. P. Fine (2009). Competing risks quantile regression. Journal of the American Statistical Association 104(488), 1440–1453.
  25. Scheetz, T. E., K.-Y. A. Kim, R. E. Swiderski, A. R. Philp, T. A. Braun, K. L. Knudtson,
  26. A. M. Dorrance, G. F. DiBona, J. Huang, T. L. Casavant, et al. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences 103(39), 14429–14434.
  27. Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette,
  28. A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences 102(43), 15545–15550.
  29. Sz´ekely, G. J., M. L. Rizzo, N. K. Bakirov, et al. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics 35(6), 2769–2794.
  30. Wang, H. J., I. W. McKeague, and M. Qian (2018). Testing for marginal linear effects in quantile regression. Journal of the Royal Statistical Society. Series B, Statistical methodology 80(2), 433.
  31. Zheng, Q., L. Peng, and X. He (2015). Globally adaptive quantile regression with ultra-high dimensional data. Annals of statistics 43(5), 2225.
  32. Zhou, Y. and L. Zhu (2018). Model-free feature screening for ultrahigh dimensional datathrough a modified blum-kiefer-rosenblatt correlation. Statistica Sinica 28(3), 1351– 1370.
  33. Zhu, L., Y. Zhang, K. Xu, et al. (2018). Measuring and testing for interval quantile dependence. The Annals of Statistics 46(6A), 2683–2710.
  34. Zhu, L.-P., L. Li, R. Li, and L.-X. Zhu (2011). Model-free feature screening for ultrahighdimensional data. Journal of the American Statistical Association 106(496), 1464–1475.
  35. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476), 1418–1429. Emory University

Acknowledgments

This work was supported by NIH grant R01 HL113548.

Supplementary Materials

available online includes technical proofs and additional results.


Supplementary materials are available for download.