Back To Index Previous Article Next Article Full Text

Statistica Sinica 33 (2023), 169-191

DATA INTEGRATION IN HIGH DIMENSION
WITH MULTIPLE QUANTILES

Guorong Dai1 , Ursula U. Müller2 and Raymond J. Carroll2

1Fudan University and 2Texas A &M University

Abstract: In this study, we focus on the analysis of high-dimensional data that come from multiple sources ("experiments"), and thus have different, possibly correlated responses, but share the same set of predictors. The measurements of the predictors may be different across experiments. We introduce a new regression approach, using multiple quantiles to select those predictors that affect any of the responses at any quantile level and to estimate the nonzero parameters. Our approach differs from established methods by being able to handle heterogeneity in data sets and heavy-tailed error distributions, two difficulties that are often encountered in complex data scenarios. Our estimator minimizes a penalized objective function that aggregates the data from the different experiments. We establish the model selection consistency and asymptotic normality of the estimator. In addition, we present an information criterion that can be used for consistent model selection. Simulations and two data applications illustrate the advantages of our method in recovering the underlying regression models. These advantages come from taking the group structure induced by the predictors across experiments and the quantile levels into account.

Key words and phrases: Data integration, high dimensional data, information criterion, penalized quantile regression.

Back To Index Previous Article Next Article Full Text