Back To Index Previous Article Next Article Full Text

Statistica Sinica 32 (2022), 1381-1409

CONDITIONAL TEST FOR ULTRAHIGH DIMENSIONAL
LINEAR REGRESSION COEFFICIENTS

Wenwen Guo1, Wei Zhong2, Sunpeng Duan3and Hengjian Cui1

1Capital Normal University, 2Xiamen University
and 3University of California at Santa Barbara

Abstract: This paper presents a conditional test for the overall significance of the regression coefficients in ultrahigh-dimensional linear models, conditional on a subset of predictors. We first propose a conditional U-statistic test (CUT) based on an estimated U-statistic for a moderately high-dimensional linear regression model, and derive its asymptotic distributions under some mild assumptions. However, the empirical power of the CUT is inversely affected by the dimensionality of the predictors. To this end, we further propose a two-stage CUT with screening (CUTS) procedure based on a random data-splitting strategy to enhance the empirical power. In the first stage, we divide the data randomly into two parts and apply conditional sure independence screening to the first part to reduce the dimensionality. In the second stage, we apply the CUT to the reduced model using the second part of the data. To eliminate the effect of data-splitting randomness and to further enhance the empirical power, we also develop a powerful ensemble CUTSM algorithm based on multiple data-splitting. We then prove that the family-wise error rate is asymptotically controlled at a given significance level. We demonstrate the excellent finite-sample performance of the proposed conditional tests usig Monte Carlo simulations and two real-data analysis examples.

Key words and phrases: Hypothesis testing, linear regression coefficients, random data splitting, ultrahigh dimensionality, variable screening.

Back To Index Previous Article Next Article Full Text