Back To Index Previous Article Next Article Full Text

Statistica Sinica 36 (2026), 977-998

INTEGRATIVE QUANTILE REGRESSION ANALYSIS
OF HETEROGENEOUS MULTISOURCE DATA
WITH PRIVACY PRESERVING

Senlin Yuan, Xuerong Chen*, Yu Wu and Jianguo Sun

Yunnan University, Southwestern University of Finance and Economics,
Nanjing Agricultural University and University of Missouri

Abstract: Researchers have used and discussed multisource data integrative analysis in many fields. In this paper, we focus on quantile regression for an analysis that has not been investigated in the literature. Specifically, we consider quantile integrative analysis of multisource and high-dimensional data where both homogeneity and heterogeneity may exist in covariate effects among different data sets. We aim to detect the homogenous and heterogenous effects, obtain the estimators of corresponding parameters, and improve the statistical efficiency of the potential homogeneous covariate effects by integrating the information contained in different data sources, while the raw data are unavailable. For the problem, we propose an objective function based on a composite penalty. In particular, we propose the composite penalty term to pursue the homogeneous and nonzero covariate effects when the dimension of covariates is high; the main term of the objective function can aggregate the quantile regression estimators from the various data sources and hence improve the statistical efficiency of potential homogeneous covariate effects. Meanwhile, it relies only on the summary statistics from each data source and thus can protect privacy to a great extent. The proposed privacy protection estimators of the homogeneous effects achieve the same statistical efficiency as the benchmark estimators obtained based on individual-level data. We establish the selection consistency and asymptotic normality of the proposed estimators for homogeneous effects, and the numerical results suggest the performance of the proposed estimators is good. Finally, we apply the proposed method to the Chinese Annual Survey of Industrial Firms data set.

Key words and phrases: Distributed learning, heterogeneous data, high dimensional, integrative analysis, privacy preservation, quantile regression.


Back To Index Previous Article Next Article Full Text