Back To Index Previous Article Next Article Full Text

Statistica Sinica 33 (2023), 2545-2560

REGRESSION WITH
SET-VALUED CATEGORICAL PREDICTORS

Ganghua Wang, Jie Ding and Yuhong Yang

University of Minnesota

Abstract: We address the regression problem with a new form of data that arises from data privacy applications. Instead of point values, the observed explanatory variables are subsets containing each individual's original value. In such cases, we cannot apply classical regression analyses, such as the least squares, because the set-valued predictors carry only partial information about the original values. We propose a computationally efficient subset least squares method for performing a regression on such data. We establish upper bounds of the prediction loss and risk in terms of the subset structure, model structure, and data dimension. The error rates are shown to be optimal in some common situations. Furthermore, we develop a model-selection method to identify the most appropriate model for prediction. Experiment results on both simulated and real-world data sets demonstrate the promising performance of the proposed method.

Key words and phrases: Model selection, regression, set-valued data.

Back To Index Previous Article Next Article Full Text