Statistica Sinica 33 (2023), 2137-2160
Huazhen Lin1, Shuangxue Zhao1, Li Liu2 and Wenyang Zhang3
Abstract: In this paper, we propose a structured multiple-index model (SMIM) for ultrahigh-dimensional data analysis. The proposed model takes many commonly used semiparametric models as special cases, including the stochastic frontier model, single-index model, and additive-index model. We estimate all of the functions and parameters based on a full likelihood-type function. As a result, the proposed estimators are shown to be semiparametrically efficient, consistent in terms of selection and estimation, and asymptotically normal. The computation is challenging owing to the combination of nonconvexity of the likelihood function, the nonsmoothness of the penalty term, and the large number of functions. To solve the computational problem, we blend spline and kernel smoothing with a majorized coordinate descendent algorithm, making the implementation easy to perform using existing packages. Intensive simulation studies show that the proposed estimation procedure outperforms alternatives for various cases. Finally, we apply the proposed SMIM and estimation procedure to a real data set from one of China's largest liquor companies, successfully identifying the 31, from 2051, most important factors affecting the sale of liquor.
Key words and phrases: High-dimensional covariates, maximum likelihood estimation, semiparametrical efficiency, structured multiple-index models, variable selection.