Sufficient Dimension Reduction for Classification

Xin Chen, Jingjing Wu, Zhigang Yao and Jia Zhang

doi:10.5705/ss.202023.0143

Abstract

We propose a new sufficient dimension reduction approach designed

deliberately for high-dimensional classification problems. This novel method is

named as Maximal Mean Variance (MMV), inspired by the mean variance index

first proposed by Cui, Li and Zhong (2015). MMV requires reasonably mild restrictions on the predictors, and keeps the model-free advantage without the need

to estimate the link function. The consistency of the MMV estimator is established under regularity conditions with possibly diverging number of predictors

and categories of the response. We also construct the asymptotic normality for

the estimator when the dimension of the predictors keeps fixed. The relationship

between MMV and several classical classification algorithms are further elaborated. Moreover, although without any definite theoretical guarantee, our method

works pretty well when the sample size is far less than the problem dimension.

The surprising classification efficiency gain of MMV is demonstrated by simulation studies and real data analysis.

Key words and phrases: Classification, consistency, mean variance index, suffi- cient dimension reduction

Information

Preprint No.	SS-2023-0143
Manuscript ID	SS-2023-0143
Complete Authors	Xin Chen, Jingjing Wu, Zhigang Yao, Jia Zhang
Corresponding Authors	Jia Zhang
Emails	zhangjia@swufe.edu.cn

References

Bura, E. and Forzani, L. (2015). Sufficient reductions in regressions with elliptically contoured inverse predictors. Journal of the American Statistical Association, 110, 420–434.
Bura, E., Duarte, S. and Forzani, L. (2016). Sufficient reductions in regressions with exponential family inverse predictors. Journal of the American Statistical Association, 111, 1313–1329.
Cheng, F. (2017). Strong uniform consistency rates of kernel estimators of cumulative distribution functions. Communications in Statistics-Theory and Methods, 46(14), 6803-6807.
Chen, X., Ma, X. and Zhou, W. (2017). Distribution Regression. arXiv preprint arXiv:1712.08781.
Cook, R. D. (1994). On the interpretation of regression plots. Journal of the American Statistical Association, 89, 177–189.
Cook, R. D. (1996). Graphics for regressions with a binary response. Journal of the American Statistical Association, 91, 983–992.
Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions through Graphics.
Wiley, New York.
Cook, R. D. and Forzani, L. (2009). Likelihood-based sufficient dimension reduction. Journal of the American Statistical Association, 104, 197–208.
Cook, R. D. and Li, L. (2009). Dimension reduction in regressions with exponential family predictors. Journal of Computational and Graphical Statistics, 18(3), 774-791.
Cook, R. D. and Ni, L. (2005). Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. Journal of the American Statistical Association, 100, 410–428.
Cook, R. D. and Weisberg, S. (1991). Comment on “sliced inverse regression for dimension reduction” by K. C. Li. Journal of the American Statistical Association, 86, 328–332.
Cui, H., Li, R. and Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.
Dawid, A. P. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society, B, 1-31.
Fabian, V. (1985). Introduction to probability and mathematical statistics. John Wiley and Sons Incorporated.
Fan, J. and Fan, Y. (2008). High dimensional classification using features annealed independence rules. The Annals of Statistics, 36(6), 2605-2637.
Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society, B, 70(5), 849-911.
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3), 928-961.
Hall, P. and Li, K. C. (1993). On almost linearity of low dimensional projections from high dimensional data. The Annals of Statistics, 867-889.
Li, B. (2018). Sufficient dimension reduction: Methods and applications with R. Chapman and Hall/CRC.
Li, K. C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.
Li, K. C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. Journal of the American Statistical Association, 87(420), 1025–1039.
Li, B. and Wang, S. L. (2007). On directional regression for dimension reduction. Journal of the American Statistical Association, 102, 997–1008.
Liu, R. and Yang, L. (2008). Kernel estimation of multivariate cumulative distribution function. Journal of Nonparametric Statistics, 20(8), 661-677.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics (Vol. 162). John Wiley and Sons.
Sheng, W. and Yin, X. (2013). Direction estimation in single-index models via distance covariance. Journal of Multivariate Analysis, 122, 148–161.
Wang, H. and Xia, Y. (2008). Sliced regression for dimension reduction. Journal of the American Statistical Association, 103(482), 811-821.
Xia, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension reduction space (with Discussion). Journal of the Royal Statistical Society, B, 64, 363–410.
Sheng, W. and Yin, X. (2016). Sufficient dimension reduction via distance covariance. Journal of Computational and Graphical Statistics 25, 91–104.
Chen, X., Zhang, J., and Zhou, W. (2022). High-dimensional elliptical sliced inverse regression in non-Gaussian distributions. Journal of Business & Economic Statistics, 40(3), 1204–1215.
Zhang, X., Mai, Q., and Zou, H. (2020). Maximum separation subspace in sufficient dimension reduction with categorical response. Journal of Machine Learning Research, 21(29), 1–36.
Zhu, Y. and Zeng, P. (2006). Fourier methods for estimating the central subspace and the central mean subspace in regression. Journal of the American Statistical Association, 101(476), 1638-1651.

Acknowledgments

tional Natural Science Foundation of China (grant nos.

71991472 and

72003150). Xin Chen was supported by the National Natural Science Foundation of China (grant nos. 12071205). Zhigang Yao was supported by

Singapore Ministry of Education Tier 2 grant (A-0008520-00-00 and A-

8001562-00-00) and Tier 1 grant (A-0004809-00-00 and A8000987-00-00) at

the National University of Singapore. Jingjing Wu was supported by Discovery Grants, NSERC (Natural Sciences and Engineering Research Council

of Canada, Grant ID/#: RGPIN-2024-06154).

Supplementary Materials

The supplementary material includes all the theoretical proof of the main

paper.

Supplementary materials are available for download.

[1] Bura, E. and Forzani, L. (2015). Sufficient reductions in regressions with elliptically contoured inverse predictors. Journal of the American Statistical Association, 110, 420–434.

[2] Bura, E., Duarte, S. and Forzani, L. (2016). Sufficient reductions in regressions with exponential family inverse predictors. Journal of the American Statistical Association, 111, 1313–1329.

[3] Cheng, F. (2017). Strong uniform consistency rates of kernel estimators of cumulative distribution functions. Communications in Statistics-Theory and Methods, 46(14), 6803-6807.

[4] Chen, X., Ma, X. and Zhou, W. (2017). Distribution Regression. arXiv preprint arXiv:1712.08781.

[5] Cook, R. D. (1994). On the interpretation of regression plots. Journal of the American Statistical Association, 89, 177–189.

[6] Cook, R. D. (1996). Graphics for regressions with a binary response. Journal of the American Statistical Association, 91, 983–992.

[7] Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions through Graphics.

[8] Wiley, New York.

[9] Cook, R. D. and Forzani, L. (2009). Likelihood-based sufficient dimension reduction. Journal of the American Statistical Association, 104, 197–208.

[10] Cook, R. D. and Li, L. (2009). Dimension reduction in regressions with exponential family predictors. Journal of Computational and Graphical Statistics, 18(3), 774-791.

[11] Cook, R. D. and Ni, L. (2005). Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. Journal of the American Statistical Association, 100, 410–428.

[12] Cook, R. D. and Weisberg, S. (1991). Comment on “sliced inverse regression for dimension reduction” by K. C. Li. Journal of the American Statistical Association, 86, 328–332.

[13] Cui, H., Li, R. and Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.

[14] Dawid, A. P. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society, B, 1-31.

[15] Fabian, V. (1985). Introduction to probability and mathematical statistics. John Wiley and Sons Incorporated.

[16] Fan, J. and Fan, Y. (2008). High dimensional classification using features annealed independence rules. The Annals of Statistics, 36(6), 2605-2637.

[17] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society, B, 70(5), 849-911.

[18] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3), 928-961.

[19] Hall, P. and Li, K. C. (1993). On almost linearity of low dimensional projections from high dimensional data. The Annals of Statistics, 867-889.

[20] Li, B. (2018). Sufficient dimension reduction: Methods and applications with R. Chapman and Hall/CRC.

[21] Li, K. C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316–342.

[22] Li, K. C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. Journal of the American Statistical Association, 87(420), 1025–1039.

[23] Li, B. and Wang, S. L. (2007). On directional regression for dimension reduction. Journal of the American Statistical Association, 102, 997–1008.

[24] Liu, R. and Yang, L. (2008). Kernel estimation of multivariate cumulative distribution function. Journal of Nonparametric Statistics, 20(8), 661-677.

[25] Serfling, R. J. (1980). Approximation theorems of mathematical statistics (Vol. 162). John Wiley and Sons.

[26] Sheng, W. and Yin, X. (2013). Direction estimation in single-index models via distance covariance. Journal of Multivariate Analysis, 122, 148–161.

[27] Wang, H. and Xia, Y. (2008). Sliced regression for dimension reduction. Journal of the American Statistical Association, 103(482), 811-821.

[28] Xia, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension reduction space (with Discussion). Journal of the Royal Statistical Society, B, 64, 363–410.

[29] Sheng, W. and Yin, X. (2016). Sufficient dimension reduction via distance covariance. Journal of Computational and Graphical Statistics 25, 91–104.

[30] Chen, X., Zhang, J., and Zhou, W. (2022). High-dimensional elliptical sliced inverse regression in non-Gaussian distributions. Journal of Business & Economic Statistics, 40(3), 1204–1215.

[31] Zhang, X., Mai, Q., and Zou, H. (2020). Maximum separation subspace in sufficient dimension reduction with categorical response. Journal of Machine Learning Research, 21(29), 1–36.

[32] Zhu, Y. and Zeng, P. (2006). Fourier methods for estimating the central subspace and the central mean subspace in regression. Journal of the American Statistical Association, 101(476), 1638-1651.