Convoluted Support Matrix Machine in High Dimensions

Bingzhen Chen and Canyi Chen

doi:10.5705/ss.202024.0194

Abstract

The Support Vector Machine (SVM) has been effective in various

discrimination problems. Recently, there has been growing interest in extending

the traditional vector-based SVM to accommodate structured matrix inputs.

However, the nonsmooth hinge loss poses significant challenges for both theoretical

and computational development. To address these issues, we propose a convex

smoothing procedure for the hinge loss. Additionally, we introduce an elastic-net

type penalty to handle high-dimensional matrix inputs. Our approach surpasses

the standard SVM for discrimination involving high-dimensional matrix inputs.

The proposed method provably achieves an optimal statistical convergence rate,

and the smooth, convex loss function enables the development of a highly efficient

optimization algorithm. This algorithm features a fast linear convergence rate and

a simple implementation. Extensive simulations and an electroencephalography

application demonstrate the method’s superiority in classification accuracy and

computational efficiency.

Key words and phrases: Linear support vector machines, asymptotic theory, convolution-type smoothing, high-dimensional matrix regression

Information

Preprint No.	SS-2024-0194
Manuscript ID	SS-2024-0194
Complete Authors	Bingzhen Chen, Canyi Chen
Corresponding Authors	Canyi Chen
Emails	canyic@umich.edu

References

Beck, A. and M. Teboulle (2012). Smoothing and first order methods: A unified framework. SIAM Journal on Optimization 22(2), 557–580.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning (First ed.). Information Science and Statistics. New York: Springer.
Boser, B. E., I. M. Guyon, and V. N. Vapnik (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory - COLT ’92, Pittsburgh, Pennsylvania, United States, pp. 144–152. ACM Press.
Boyd, S. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning 3(1), 1–122.
Chen, X., W. Liu, and Y. Zhang (2019, December). Quantile regression under memory constraint. Annals of Statistics 47(6), 3244–3273.
Cui, H., B. Loureiro, F. Krzakala, and L. Zdeborov´a (2022, November). Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime. Journal of
Statistical Mechanics: Theory and Experiment 2022(11), 114004.
Do˘gan, ¨U., T. Glasmachers, and C. Igel (2016). A unified view on multi-class support vector classification. Journal of Machine Learning Research 17(45), 1–32.
Eberts, M. and I. Steinwart (2013, January). Optimal regression rates for SVMs using Gaussian kernels. Electronic Journal of Statistics 7, 1–42.
Egashira, K. (2024). Asymptotic properties of multiclass support vector machine under high dimensional settings. Communications in Statistics-Simulation and Computation 53(4), 1991–2005.
Egashira, K., K. Yata, and M. Aoshima (2021). Asymptotic properties of distance-weighted discrimination and its bias correction for high-dimension, low-sample-size data. Japanese Journal of Statistics and Data Science 4(2), 821–840.
Fan, J., R. Li, C.-H. Zhang, and H. Zou (2020). Statistical Foundation of Data Science (First ed.). CRC Data Science Series. Boca Raton: CRC Press.
Fernandes, M., E. Guerre, and E. Horta (2021, January). Smoothing Quantile Regressions.
Journal of Business & Economic Statistics 39(1), 338–357.
Ferris, M. C. and T. S. Munson (2002, January). Interior-Point Methods for Massive Support
Vector Machines. SIAM Journal on Optimization 13(3), 783–804.
Friedman, J., T. Hastie, and R. Tibshirani (2000). Additive logistic regression: A statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics 28(2), 337–407.
Friedrichs, K. O. (1944). The Identity of Weak and Strong Extensions of Differential Operators. Transactions of the American Mathematical Society 55(1), 132–151.
Galvao, A. F. and K. Kato (2016, July). Smoothed quantile regression for panel data. Journal of Econometrics 193(1), 92–112.
Goldstein, T., B. O’Donoghue, S. Setzer, and R. Baraniuk (2014). Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7, 1588–1623.
He, X., X. Pan, K. M. Tan, and W.-X. Zhou (2021, August). Smoothed quantile regression with large-scale inference. Journal of Econometrics, S0304407621001950.
Horowitz, J. L. (1998). Bootstrap Methods for Median Regression Models. Econometrica 66(6), 1327.
Hung, Hung, Chen-Chien, and Wang (2013). Matrix variate logistic regression model with application to eeg data. Biostatistics 14(1), 189–202.
Koo, J.-Y., Y. Lee, Y. Kim, and C. Park (2008, July). A Bahadur Representation of the Linear
Support Vector Machine. Journal of Machine Learning Research 9, 1343–1368.
Luo, L., Y. Xie, Z. Zhang, and W.-J. Li (2015, June). Support Matrix Machines. In Proceedings of the 32nd International Conference on Machine Learning, pp. 938–947. PMLR.
Marron, J. S., M. J. Todd, and J. Ahn (2007). Distance-weighted discrimination. Journal of the American Statistical Association 102(480), 1267–1271.
Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical programming 103, 127–152.
Park, C., K.-R. Kim, R. Myung, and J.-Y. Koo (2012). Oracle properties of SCAD-penalized support vector machine. Journal of Statistical Planning and Inference 142(8), 2257–2270.
Peng, B., L. Wang, and Y. Wu (2016). An error bound for l1-norm support vector machine coefficients in ultra-high dimension. Journal of Machine Learning Research 17(233), 1–26.
Rosset, S. and J. Zhu (2007, July). Piecewise linear regularized solution paths. The Annals of Statistics 35(3), 1012–1030.
Rubinstein, R. Y. (1983, February). Smoothed Functionals in Stochastic Optimization. Mathematics of Operations Research 8(1), 26–33.
Steinwart, I. and C. Scovel (2007, April). Fast rates for support vector machines using Gaussian kernels. The Annals of Statistics 35(2), 575–607.
Tan, K. M., L. Wang, and W.-X. Zhou (2021, September). High-Dimensional Quantile Regression:
Convolution Smoothing and Concave Regularization.
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory (Second ed.). New York, NY: Springer New York.
Vedaldi, A. and A. Zisserman (2012, March). Efficient Additive Kernels via Explicit Feature
Maps. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3), 480–492.
Wang, B., L. Zhou, Y. Gu, and H. Zou (2022). Density-Convoluted Support Vector Machines for High-Dimensional Classification. IEEE Transactions on Information Theory 69(4), 2523–2536.
Wang, B. and H. Zou (2019, July). A Multicategory Kernel Distance Weighted Discrimination
Method for Multiclass Classification. Technometrics 61(3), 396–408.
Wang, L., J. Zhu, and H. Zou (2006). The Doubly Regularized Support Vector Machine. Statistica Sinica 16(2), 589–615.
Wang, L., J. Zhu, and H. Zou (2008, February). Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3), 412–419.
Wang, X., Z. Yang, X. Chen, and W. Liu (2019). Distributed inference for linear support vector machine. Journal of Machine Learning Research 20, 113:1–113:41.
Wei, M. and G. Ye (2023). Solving Second-Order Cone Programs Deterministically in Matrix Multiplication Time. https://shorturl.at/l1Q4G.
Xu, W., J. Liu, and H. Lian (2024). Distributed Estimation of Support Vector Machines for Matrix Data. IEEE Transactions on Neural Networks and Learning Systems 35(5), 6643–6653.
Zheng, Q., F. Zhu, J. Qin, and P.-A. Heng (2018, September). Multiclass Support Matrix
Machine for Single Trial EEG Classification. Neurocomputing 275, 869–880.
Zhou, H. and L. Li (2014). Regularized matrix regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(2), 463–483.
Zhu, J., S. Rosset, R. Tibshirani, and T. Hastie (2003). 1-norm Support Vector Machines. In Advances in Neural Information Processing Systems, Volume 16. MIT Press.
Zou, H., J. Zhu, and T. Hastie (2008). New multicategory boosting algorithms based on multicategory Fisher-consistent losses. The Annals of Applied Statistics 2(4), 1290–1306.

Acknowledgments

supported by the Natural Science Foundation of Cangzhou (221001007D)

and the Scientific Research Foundation of Hangzhou Dianzi University

(KYS155623054). We appreciate the Editor, Associate Editor, and two

anonymous reviewers for their constructive suggestions that have significantly

improved our manuscript.

Supplementary Materials

The online supplementary material contains complete proofs of the main

theoretical results presented in the manuscript, detailed technical derivations,

an extension to tensor-valued data, and additional simulation results.

Supplementary materials are available for download.

[1] Beck, A. and M. Teboulle (2012). Smoothing and first order methods: A unified framework. SIAM Journal on Optimization 22(2), 557–580.

[2] Bishop, C. M. (2006). Pattern Recognition and Machine Learning (First ed.). Information Science and Statistics. New York: Springer.

[3] Boser, B. E., I. M. Guyon, and V. N. Vapnik (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory - COLT ’92, Pittsburgh, Pennsylvania, United States, pp. 144–152. ACM Press.

[4] Boyd, S. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning 3(1), 1–122.

[5] Chen, X., W. Liu, and Y. Zhang (2019, December). Quantile regression under memory constraint. Annals of Statistics 47(6), 3244–3273.

[6] Cui, H., B. Loureiro, F. Krzakala, and L. Zdeborov´a (2022, November). Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime. Journal of

[7] Statistical Mechanics: Theory and Experiment 2022(11), 114004.

[8] Do˘gan, ¨U., T. Glasmachers, and C. Igel (2016). A unified view on multi-class support vector classification. Journal of Machine Learning Research 17(45), 1–32.

[9] Eberts, M. and I. Steinwart (2013, January). Optimal regression rates for SVMs using Gaussian kernels. Electronic Journal of Statistics 7, 1–42.

[10] Egashira, K. (2024). Asymptotic properties of multiclass support vector machine under high dimensional settings. Communications in Statistics-Simulation and Computation 53(4), 1991–2005.

[11] Egashira, K., K. Yata, and M. Aoshima (2021). Asymptotic properties of distance-weighted discrimination and its bias correction for high-dimension, low-sample-size data. Japanese Journal of Statistics and Data Science 4(2), 821–840.

[12] Fan, J., R. Li, C.-H. Zhang, and H. Zou (2020). Statistical Foundation of Data Science (First ed.). CRC Data Science Series. Boca Raton: CRC Press.

[13] Fernandes, M., E. Guerre, and E. Horta (2021, January). Smoothing Quantile Regressions.

[14] Journal of Business & Economic Statistics 39(1), 338–357.

[15] Ferris, M. C. and T. S. Munson (2002, January). Interior-Point Methods for Massive Support

[16] Vector Machines. SIAM Journal on Optimization 13(3), 783–804.

[17] Friedman, J., T. Hastie, and R. Tibshirani (2000). Additive logistic regression: A statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics 28(2), 337–407.

[18] Friedrichs, K. O. (1944). The Identity of Weak and Strong Extensions of Differential Operators. Transactions of the American Mathematical Society 55(1), 132–151.

[19] Galvao, A. F. and K. Kato (2016, July). Smoothed quantile regression for panel data. Journal of Econometrics 193(1), 92–112.

[20] Goldstein, T., B. O’Donoghue, S. Setzer, and R. Baraniuk (2014). Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7, 1588–1623.

[21] He, X., X. Pan, K. M. Tan, and W.-X. Zhou (2021, August). Smoothed quantile regression with large-scale inference. Journal of Econometrics, S0304407621001950.

[22] Horowitz, J. L. (1998). Bootstrap Methods for Median Regression Models. Econometrica 66(6), 1327.

[23] Hung, Hung, Chen-Chien, and Wang (2013). Matrix variate logistic regression model with application to eeg data. Biostatistics 14(1), 189–202.

[24] Koo, J.-Y., Y. Lee, Y. Kim, and C. Park (2008, July). A Bahadur Representation of the Linear

[25] Support Vector Machine. Journal of Machine Learning Research 9, 1343–1368.

[26] Luo, L., Y. Xie, Z. Zhang, and W.-J. Li (2015, June). Support Matrix Machines. In Proceedings of the 32nd International Conference on Machine Learning, pp. 938–947. PMLR.

[27] Marron, J. S., M. J. Todd, and J. Ahn (2007). Distance-weighted discrimination. Journal of the American Statistical Association 102(480), 1267–1271.

[28] Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical programming 103, 127–152.

[29] Park, C., K.-R. Kim, R. Myung, and J.-Y. Koo (2012). Oracle properties of SCAD-penalized support vector machine. Journal of Statistical Planning and Inference 142(8), 2257–2270.

[30] Peng, B., L. Wang, and Y. Wu (2016). An error bound for l1-norm support vector machine coefficients in ultra-high dimension. Journal of Machine Learning Research 17(233), 1–26.

[31] Rosset, S. and J. Zhu (2007, July). Piecewise linear regularized solution paths. The Annals of Statistics 35(3), 1012–1030.

[32] Rubinstein, R. Y. (1983, February). Smoothed Functionals in Stochastic Optimization. Mathematics of Operations Research 8(1), 26–33.

[33] Steinwart, I. and C. Scovel (2007, April). Fast rates for support vector machines using Gaussian kernels. The Annals of Statistics 35(2), 575–607.

[34] Tan, K. M., L. Wang, and W.-X. Zhou (2021, September). High-Dimensional Quantile Regression:

[35] Convolution Smoothing and Concave Regularization.

[36] Vapnik, V. N. (2000). The Nature of Statistical Learning Theory (Second ed.). New York, NY: Springer New York.

[37] Vedaldi, A. and A. Zisserman (2012, March). Efficient Additive Kernels via Explicit Feature

[38] Maps. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3), 480–492.

[39] Wang, B., L. Zhou, Y. Gu, and H. Zou (2022). Density-Convoluted Support Vector Machines for High-Dimensional Classification. IEEE Transactions on Information Theory 69(4), 2523–2536.

[40] Wang, B. and H. Zou (2019, July). A Multicategory Kernel Distance Weighted Discrimination

[41] Method for Multiclass Classification. Technometrics 61(3), 396–408.

[42] Wang, L., J. Zhu, and H. Zou (2006). The Doubly Regularized Support Vector Machine. Statistica Sinica 16(2), 589–615.

[43] Wang, L., J. Zhu, and H. Zou (2008, February). Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3), 412–419.

[44] Wang, X., Z. Yang, X. Chen, and W. Liu (2019). Distributed inference for linear support vector machine. Journal of Machine Learning Research 20, 113:1–113:41.

[45] Wei, M. and G. Ye (2023). Solving Second-Order Cone Programs Deterministically in Matrix Multiplication Time. https://shorturl.at/l1Q4G.

[46] Xu, W., J. Liu, and H. Lian (2024). Distributed Estimation of Support Vector Machines for Matrix Data. IEEE Transactions on Neural Networks and Learning Systems 35(5), 6643–6653.

[47] Zheng, Q., F. Zhu, J. Qin, and P.-A. Heng (2018, September). Multiclass Support Matrix

[48] Machine for Single Trial EEG Classification. Neurocomputing 275, 869–880.

[49] Zhou, H. and L. Li (2014). Regularized matrix regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(2), 463–483.

[50] Zhu, J., S. Rosset, R. Tibshirani, and T. Hastie (2003). 1-norm Support Vector Machines. In Advances in Neural Information Processing Systems, Volume 16. MIT Press.

[51] Zou, H., J. Zhu, and T. Hastie (2008). New multicategory boosting algorithms based on multicategory Fisher-consistent losses. The Annals of Applied Statistics 2(4), 1290–1306.