Communication-Efficient Estimation of Regularized Smoothed Support Tensor Machine

Zihao Song, Lei Wang, Riquan Zhang and Weihua Zhao

doi:10.5705/ss.202025.0109

Abstract

Tensor analysis methods are becoming increasingly prevalent across various sci

entific applications, including neuroscience and signal processing. Existing tensor discrimination models often rely on decomposition techniques such as CANDECOMP/PARAFAC

and Tucker decomposition. However, these methods typically require unfolding of tensors

into matrices, which may compromise their intrinsic structural information. This article

harnesses the recently introduced concept of tubal rank to present a smoothed support tensor machine with tubal nuclear norm regularization. The statistical properties of the result-

ing estimator are established, and the framework is extended to a distributed setting. Within

this paradigm, a communication-efficient regularized estimator is introduced, which only

needs access to local data from the first machine and gradient information from other local machines. Furthermore, the convergence rate of this distributed estimator is derived.

By exploiting the well-defined properties of the tubal nuclear norm, we provide theoretical guarantees for low-rank structure recovery. To compute the estimator, an alternating

minimization algorithm is developed, and its global convergence properties are analyzed.

Lastly, extensive simulations are carried out to validate the proposed method, and its practical utility is demonstrated in an application involving data from invasive ductal carcinoma.

Key words and phrases: Support tensor machine; Kernel density smoothing; Low tubal rank; Distributed estimator; Tubal nuclear norm

Information

Preprint No.	SS-2025-0109
Manuscript ID	SS-2025-0109
Complete Authors	Zihao Song, Lei Wang, Riquan Zhang, Weihua Zhao
Corresponding Authors	Weihua Zhao
Emails	zhaowhstat@163.com

References

Chen, C., K. Batselier, C.-Y. Ko, and N. Wong (2019). A support tensor train machine. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8.
Gandy, S., B. Recht, and I. Yamada (2011). Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl. 27, 025010.
Geoffrey, C., G. Lecu´e, and M. Lerasle (2020). Robust statistical learning with lipschitz and convex loss functions. Probab. Theory Related Fields 176, 897–940.
Hao, Z., L. He, B. Chen, and X. Yang (2013). A linear support higher-order tensor machine for classification. IEEE Trans. Image Process. 22(7), 2911–2920.
Huang, B., C. Mu, D. Goldfarb, and J. Wright (2015). Provable models for robust low-rank tensor completion. Pacific J. Optimization. 11, 339–364.
Jordan, M. I., J. D. Lee, and Y. Yang (2019). Communication-efficient distributed statistical inference. J. Am. Stat. Assoc. 114(526), 668–681.
Kilmer, M. and C. Martin (2011). Factorization strategies for third-order tensors. Linear Algebra Appl. 435, 641–658.
Kilmer, M. E., K. Braman, N. Hao, and R. C. Hoover (2013). Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172.
Kilmer, M. E., C. D. Martin, and L. Perrone (2008). A third-order generalization of the matrix svd as a product of third-order tensors. Tufts University, Department of Computer Science, Tech. Rep. TR2008-4. Available: https://api.semanticscholar.org/CorpusID:10283434.
Kolda, T. G. and B. W. Bader (2009). Tensor decompositions and applications. SIAM Rev. 51(3), 455–500.
Koltchinskii, V., K. Lounici, and A. B. Tsybakov (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39(5), 2302 – 2329.
Koo, J.-Y., Y. Lee, Y. Kim, and C. Park (2008). A bahadur representation of the linear support vector machine. J. Mach. Learn. Res. 9, 1343–1368.
Kotsia, I. and I. Patras (2011). Support tucker machines. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 633–640.
Kour, K., S. Dolgov, M. Stoll, and P. Benner (2023). Efficient structure-preserving support tensor train machine. J. Mach. Learn. Res. 24(4), 1–22.
Lian, H. (2021). Learning rate for convex support tensor machines. IEEE Trans. Neural Netw. Learn. Syst. 32(8), 3755–3760.
Lian, H. and Z. Fan (2018). Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. J. Mach. Learn. Res. 18(182), 1–26.
Lu, C., J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan (2018). Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 925–938.
Negahban, S., P. Ravikumar, M. Wainwright, and B. Yu (2010). A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statist. Sci. 27, 538–557.
Oseledets, I. (2011). Tensor-train decomposition. SIAM J. Scientific Computing 33, 2295–2317.
Peng, B., L. Wang, and Y. Wu (2016). An error bound for l1-norm support vector machine coefficients in ultra-high dimension. J. Mach. Learn. Res. 17(1), 8279–8304.
Raskutti, G., M. Yuan, and H. Chen (2019). Convex regularization for high-dimensional multiresponse tensor regression. Ann. Stat. 47(3), 1554 – 1584.
Roy, S. and G. Michailidis (2022). Regularized high dimension low tubal-rank tensor regression. Electron. J. Stat. 16, 2683–2723.
Tan, K. M., L. Wang, and W. X. Zhou (2021). High-dimensional quantile regression: Convolution smoothing and concave regularization. J. R. Stat. Soc. B. 84(1), 205–233.
Vapnik, V. (2013). The Nature of Statistical Learning Theory. Springer: New York, NY,USA.
Wang, B., L. Zhou, Y. Gu, and H. Zou (2023). Density-convoluted support vector machines for highdimensional classification. IEEE Trans. Inf. Theory. 69(4), 2523–2536.
Wang, L., X. Zhang, and Q. Gu (2017). A unified computational and statistical framework for nonconvex low-rank matrix estimation. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Volume 54 of Proceedings of Machine Learning Research, pp. 981–990. PMLR.
Wang, X., Z. Yang, X. Chen, and W. Liu (2019). Distributed inference for linear support vector machine. J. Mach. Learn. Res. 20(113), 1–41.
Wang, Y., W. Lu, L. Wang, Z. Zhu, H. Lin, and H. Lian (2025). Regularized adaptive huber matrix regression and distributed learning. Stat. Sinica 35, 919–937.
Xu, W., J. Liu, and H. Lian (2024). Distributed estimation of support vector machines for matrix data. IEEE Trans. Neural Netw. Learn. Syst. 35(5), 6643–6653.
Zeng, D., S. Wang, Y. Shen, and C. Shi (2017). A ga-based feature selection and parameter optimization for support tucker machine. Procedia Comput. Sci. 111, 17–23. Zihao Song

Acknowledgments

This work was supported in part by the National Social Science Fund (22BTJ025),

the National Natural Science Fund (12271272,12371272), the Basic Research

Project of Shanghai Science and Technology Commission (22JC1400800) and

the Postgraduate Research & Practice Innovation Program of Jiangsu Province

Supplementary Materials

The preliminaries of the tensor-tensor product (t-product), the proofs of the

theorems, and some results of simulations are contained in the Supplementary

Materials.

Supplementary materials are available for download.

[1] Chen, C., K. Batselier, C.-Y. Ko, and N. Wong (2019). A support tensor train machine. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8.

[2] Gandy, S., B. Recht, and I. Yamada (2011). Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl. 27, 025010.

[3] Geoffrey, C., G. Lecu´e, and M. Lerasle (2020). Robust statistical learning with lipschitz and convex loss functions. Probab. Theory Related Fields 176, 897–940.

[4] Hao, Z., L. He, B. Chen, and X. Yang (2013). A linear support higher-order tensor machine for classification. IEEE Trans. Image Process. 22(7), 2911–2920.

[5] Huang, B., C. Mu, D. Goldfarb, and J. Wright (2015). Provable models for robust low-rank tensor completion. Pacific J. Optimization. 11, 339–364.

[6] Jordan, M. I., J. D. Lee, and Y. Yang (2019). Communication-efficient distributed statistical inference. J. Am. Stat. Assoc. 114(526), 668–681.

[7] Kilmer, M. and C. Martin (2011). Factorization strategies for third-order tensors. Linear Algebra Appl. 435, 641–658.

[8] Kilmer, M. E., K. Braman, N. Hao, and R. C. Hoover (2013). Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172.

[9] Kilmer, M. E., C. D. Martin, and L. Perrone (2008). A third-order generalization of the matrix svd as a product of third-order tensors. Tufts University, Department of Computer Science, Tech. Rep. TR2008-4. Available: https://api.semanticscholar.org/CorpusID:10283434.

[10] Kolda, T. G. and B. W. Bader (2009). Tensor decompositions and applications. SIAM Rev. 51(3), 455–500.

[11] Koltchinskii, V., K. Lounici, and A. B. Tsybakov (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39(5), 2302 – 2329.

[12] Koo, J.-Y., Y. Lee, Y. Kim, and C. Park (2008). A bahadur representation of the linear support vector machine. J. Mach. Learn. Res. 9, 1343–1368.

[13] Kotsia, I. and I. Patras (2011). Support tucker machines. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 633–640.

[14] Kour, K., S. Dolgov, M. Stoll, and P. Benner (2023). Efficient structure-preserving support tensor train machine. J. Mach. Learn. Res. 24(4), 1–22.

[15] Lian, H. (2021). Learning rate for convex support tensor machines. IEEE Trans. Neural Netw. Learn. Syst. 32(8), 3755–3760.

[16] Lian, H. and Z. Fan (2018). Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. J. Mach. Learn. Res. 18(182), 1–26.

[17] Lu, C., J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan (2018). Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 925–938.

[18] Negahban, S., P. Ravikumar, M. Wainwright, and B. Yu (2010). A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statist. Sci. 27, 538–557.

[19] Oseledets, I. (2011). Tensor-train decomposition. SIAM J. Scientific Computing 33, 2295–2317.

[20] Peng, B., L. Wang, and Y. Wu (2016). An error bound for l1-norm support vector machine coefficients in ultra-high dimension. J. Mach. Learn. Res. 17(1), 8279–8304.

[21] Raskutti, G., M. Yuan, and H. Chen (2019). Convex regularization for high-dimensional multiresponse tensor regression. Ann. Stat. 47(3), 1554 – 1584.

[22] Roy, S. and G. Michailidis (2022). Regularized high dimension low tubal-rank tensor regression. Electron. J. Stat. 16, 2683–2723.

[23] Tan, K. M., L. Wang, and W. X. Zhou (2021). High-dimensional quantile regression: Convolution smoothing and concave regularization. J. R. Stat. Soc. B. 84(1), 205–233.

[24] Vapnik, V. (2013). The Nature of Statistical Learning Theory. Springer: New York, NY,USA.

[25] Wang, B., L. Zhou, Y. Gu, and H. Zou (2023). Density-convoluted support vector machines for highdimensional classification. IEEE Trans. Inf. Theory. 69(4), 2523–2536.

[26] Wang, L., X. Zhang, and Q. Gu (2017). A unified computational and statistical framework for nonconvex low-rank matrix estimation. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Volume 54 of Proceedings of Machine Learning Research, pp. 981–990. PMLR.

[27] Wang, X., Z. Yang, X. Chen, and W. Liu (2019). Distributed inference for linear support vector machine. J. Mach. Learn. Res. 20(113), 1–41.

[28] Wang, Y., W. Lu, L. Wang, Z. Zhu, H. Lin, and H. Lian (2025). Regularized adaptive huber matrix regression and distributed learning. Stat. Sinica 35, 919–937.

[29] Xu, W., J. Liu, and H. Lian (2024). Distributed estimation of support vector machines for matrix data. IEEE Trans. Neural Netw. Learn. Syst. 35(5), 6643–6653.

[30] Zeng, D., S. Wang, Y. Shen, and C. Shi (2017). A ga-based feature selection and parameter optimization for support tucker machine. Procedia Comput. Sci. 111, 17–23. Zihao Song