Back To Index Previous Article Next Article Full Text

Statistica Sinica 36 (2026), 331-349

STATISTICAL INFERENCE FOR HIGH DIMENSIONAL
REGRESSION WITH PROXY DATA

Sai Li1, T. Tony Cai2 and Hongzhe Li*2

1Renmin University of China and 2University of Pennsylvania

Abstract: Existing high-dimensional statistical methods are largely developed for analyzing individual-level data. In this work, we study estimation and inference for high-dimensional linear models when only "proxy data" is available. These proxies encompass marginal statistics and sample covariance matrices computed from distinct sets of individuals. We develop a rate optimal method for estimation and inference for the regression coefficient vector and its linear functionals based on the proxy data. We show the intrinsic limitations in the proxy-data based inference: the minimax optimal rate for estimation is slower than that in the conventional case where individual data are observed. These interesting findings are illustrated through simulation studies and an analysis of a dataset concerning the genetic associations of hindlimb muscle weights in a mouse population.

Key words and phrases: Linear functional, sparse regression, summary statistics.


Back To Index Previous Article Next Article Full Text