Optimal Model Averaging for Imbalanced Classification

Chen, Ze; Liao, Jun; Xu, Wangli; Yang, Yuhong

doi:10.5705/ss.202024.0012

Abstract

Imbalanced data with a high-dimensional input has been widely en

countered in many areas of applications. In this situation, it usually becomes

essential to reduce redundant variables via model selection to improve the classification performance. However, with a large number of variables, model selection

uncertainty is typically very high. To deal with this problem, we present a feasible model averaging procedure based on a cost-sensitive support vector machine

(CSSVM) coupled with a cost-sensitive data-driven weight choice criterion for

imbalanced classification. Theoretical justifications are provided in two distinct

scenarios. When the data exhibits a weak imbalance, we derive a relatively fast

uniform convergence rate of the CSSVM solution. In contrast, when the data possesses a strong imbalance, the convergence rate becomes much slower. In both

scenarios, an asymptotic optimality of the proposed model averaging approach in

the sense of minimizing the out-of-sample hinge loss is established. Moreover, to

reduce the computational burden imposed by a large number of candidate models

for model averaging, we develop the CSSVM with an L1-norm penalty to prepare candidate models. Numerical analysis shows the superiority of the proposed

model averaging procedure over existing imbalanced classification methods.

Key words and phrases: Asymptotic optimality, Imbalanced data, Model averag- ing, Uniform convergence rate

Information

Preprint No.	SS-2024-0012
Manuscript ID	SS-2024-0012
Complete Authors	Ze Chen, Jun Liao, Wangli Xu, Yuhong Yang
Corresponding Authors	Yuhong Yang
Emails	yangx374@umn.edu

References

Ando, T. and Li, K.C. (2017). “A weight-relaxed model averaging approach for high-dimensional generalized linear models.” The Annals of Statistics, 45, 2654–2679.
Arefeen, M.A., Nimi, S.T., and Rahman, M.S. (2022). “Neural network-based undersampling techniques.” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52, 1111– 1120.
Babu, S. and Ananthanarayanan, N. (2018). “Enhanced prediction model for customer churn in telecommunication using EMOTE.” In “International Conference on Intelligent Computing and Applications,” pages 465–475.
Bradley, A.P. (1997). “The use of the area under the ROC curve in the evaluation of machine learning algorithms.” Pattern Recognition, 30, 1145–1159.
Bugnon, L.A., Yones, C., Milone, D.H., and Stegmayer, G. (2019). “Deep neural architectures
for highly imbalanced data in bioinformatics.” IEEE Transactions on Neural Networks and Learning Systems, 31, 2857–2867.
Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P. (2002). “SMOTE: synthetic minority over-sampling technique.” Journal of Artificial Intelligence Research, 16, 321– 357.
Chen, Z., Liao, J., Xu, W., and Yang, Y. (2023). “Multifold cross-validation model averaging for generalized additive partial linear models.” Journal of Computational and Graphical Statistics, 32, 1649–1659.
Chen, Z., Zhang, J., Xu, W., and Yang, Y. (2022). “Consistency of BIC model averaging.” Statistica Sinica, 32, 1–6.
Douzas, G. and Bacao, F. (2017). “Self-organizing map oversampling (SOMO) for imbalanced data set learning.” Expert Systems with Applications, 82, 40–52.
Draper, D. (1995). “Assessment and propagation of model uncertainty.” Journal of the Royal Statistical Society, Series B, 57, 45–97.
Drummond, C., Holte, R.C., et al. (2003). “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling.” In “Workshop on Learning from Imbalanced Datasets II,” volume 11, pages 1–8.
Fang, F., Li, J., and Xia, X. (2022). “Semiparametric model averaging prediction for dichotomous response.” Journal of Econometrics, 229, 219–245.
Grobelnik, M. (1999). “Feature selection for unbalanced class distribution and Naive Bayes.” In “International Conference on Machine Learning,” pages 258–267.
Hansen, B.E. (2007). “Least squares model averaging.” Econometrica, 75, 1175–1189.
Hansen, B.E. (2014). “Model averaging, asymptotic risk, and regressor groups.” Quantitative Economics, 5, 495–530.
He, B., Ma, S., Zhang, X., and Zhu, L. (2023). “Rank-based greedy model averaging for high-dimensional survival data.” Journal of the American Statistical Association, 118, 2658–2670.
Hoeting, J.A., Madigan, D., Raftery, A.E., and Volinsky, C.T. (1999). “Bayesian model averaging: A tutorial.” Statistical Science, 14, 382–401.
Koo, J.Y., Lee, Y., Kim, Y., and Park, C. (2008). “A bahadur representation of the linear support vector machine.” Journal of Machine Learning Research, 9, 1343–1368.
Koziarski, M., Krawczyk, B., and Wo´zniak, M. (2019). “Radial-based oversampling for noisy imbalanced data classification.” Neurocomputing, 343, 19–33.
Kubat, M., Matwin, S., et al. (1997). “Addressing the curse of imbalanced training sets: onesided selection.” In “International Conference on Machine Learning,” pages 179–186.
Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., and Tao, D. (2018). “Cost-sensitive feature selection by optimizing F-measures.” IEEE Transactions on Image Processing, 27, 1323–1335.
Liu, X.Y., Wu, J., and Zhou, Z.H. (2008). “Exploratory undersampling for class-imbalance
learning.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39, 539–550.
Lunardon, N., Menardi, G., and Torelli, N. (2014). “ROSE: a package for binary imbalanced learning.” R Journal, 6, 79–89.
Maldonado, S., Weber, R., and Famili, F. (2014). “Feature selection for high-dimensional classimbalanced data sets using support vector machines.” Information Sciences, 286, 228–246.
Mathew, J., Pang, C.K., Luo, M., and Leong, W.H. (2017). “Classification of imbalanced data by oversampling in kernel space of support vector machines.” IEEE Transactions on Neural Networks and Learning Systems, 29, 4065–4076.
Nan, Y. and Yang, Y. (2014). “Variable selection diagnostics measures for high-dimensional regression.” Journal of Computational and Graphical Statistics, 23, 636–656.
Peng, B., Wang, L., and Wu, Y. (2016). “An error bound for L1-norm support vector machine coefficients in ultra-high dimension.” Journal of Machine Learning Research, 17, 8279– 8304.
Puthiya Parambath, S., Usunier, N., and Grandvalet, Y. (2014). “Optimizing F-measures by cost-sensitive classification.” In “Advances in Neural Information Processing Systems,” pages 2123–2131.
Rahman, M.M. and Davis, D.N. (2013). “Addressing the class imbalance problem in medical datasets.” International Journal of Machine Learning and Computing, 3, 224–228.
Veropoulos, K., Campbell, C., Cristianini, N., et al. (1999). “Controlling the sensitivity of support vector machines.” In “International Joint Conference on Artificial Intelligence,” volume 55, pages 55–60.
Wang, H. (2020). “Logistic regression for massive data with rare events.” In “International Conference on Machine Learning,” volume 119, pages 9829–9836.
Yang, C.Y., Yang, J.S., and Wang, J.J. (2009). “Margin calibration in SVM class-imbalanced learning.” Neurocomputing, 73, 397–411.
Yang, Q. and Wu, X. (2006). “10 challenging problems in data mining research.” International Journal of Information Technology & Decision Making, 5, 597–604.
Yang, Y. (2001). “Adaptive regression by mixing.” Journal of the American Statistical Association, 96, 574–588.
Yang, Y. (2006). “Comparing learning methods for classification.” Statistica Sinica, 16, 635– 657.
Yin, L., Ge, Y., Xiao, K., Wang, X., and Quan, X. (2013). “Feature selection for highdimensional imbalanced data.” Neurocomputing, 105, 3–11.
Yuan, Z. and Yang, Y. (2005). “Combining linear regression models: When and how?” Journal of the American Statistical Association, 100, 1202–1214.
Zhang, C., Tan, K.C., Li, H., and Hong, G.S. (2018). “A cost-sensitive deep belief network for imbalanced classification.” IEEE Transactions on Neural Networks and Learning Systems,
30, 109–122.
Zhang, S. (2020). “Cost-sensitive KNN classification.” Neurocomputing, 391, 234–242.
Zhang, X. and Liu, C.A. (2023). “Model averaging prediction by K-fold cross-validation.” Journal of Econometrics, 235, 280–301.
Zhang, X., Lu, Z., and Zou, G. (2013). “Adaptively combined forecasting for discrete response time series.” Journal of Econometrics, 176, 80–91.
Zhang, X., Wu, Y., Wang, L., and Li, R. (2016a). “A consistent information criterion for support vector machines in diverging model spaces.” Journal of Machine Learning Research, 17, 466–491.
Zhang, X., Wu, Y., Wang, L., and Li, R. (2016b).
“Variable selection for support vector machines in moderately high dimensions.” Journal of the Royal Statistical Society, Series
B, 78, 53–76.
Zhang, X., Yu, D., Zou, G., and Liang, H. (2016c).
“Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models.” Journal of the
American Statistical Association, 111, 1175–1790.
Zhang, X., Zou, G., Liang, H., and Carroll, R.J. (2020). “Parsimonious model averaging with a diverging number of parameters.” Journal of the American Statistical Association, 115, 972–984.
Zhou, Z.H. and Liu, X.Y. (2005). “Training cost-sensitive neural networks with methods address ing the class imbalance problem.” IEEE Transactions on Knowledge and Data Engineering, 18, 63–77.
Zhu, J., Rosset, S., Tibshirani, R., and Hastie, T. (2004). “1-norm support vector machines.” In “Advances in Neural Information Processing Systems,” volume 16, pages 49–56.
Zou, J., Yuan, C., Zhang, X., Zou, G., and Wan, A.T. (2023). “Model averaging for support vector classifier by cross-validation.” Statistics and Computing, 33, 117–139. Ze Chen Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan 250100, China.

Acknowledgments

We truely appreciate the constructive suggestions made by three reviewers.

We also thank Co-Editor Yi-Hau Chen and the AE for advices on

revising our work.

The work of Ze Chen is supported by the Postdoctoral Fellowship Program of CPSF (No.

GZC20231478) and the China

Postdoctoral Science Foundation (No. 2024M761782). The work of Jun

Supplementary Materials

The proofs of all theoretical results, the justifications of conditions, and

additional numerical results are provided in the Supplementary Material

document.

Supplementary materials are available for download.

[1] Ando, T. and Li, K.C. (2017). “A weight-relaxed model averaging approach for high-dimensional generalized linear models.” The Annals of Statistics, 45, 2654–2679.

[2] Arefeen, M.A., Nimi, S.T., and Rahman, M.S. (2022). “Neural network-based undersampling techniques.” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52, 1111– 1120.

[3] Babu, S. and Ananthanarayanan, N. (2018). “Enhanced prediction model for customer churn in telecommunication using EMOTE.” In “International Conference on Intelligent Computing and Applications,” pages 465–475.

[4] Bradley, A.P. (1997). “The use of the area under the ROC curve in the evaluation of machine learning algorithms.” Pattern Recognition, 30, 1145–1159.

[5] Bugnon, L.A., Yones, C., Milone, D.H., and Stegmayer, G. (2019). “Deep neural architectures

[6] for highly imbalanced data in bioinformatics.” IEEE Transactions on Neural Networks and Learning Systems, 31, 2857–2867.

[7] Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P. (2002). “SMOTE: synthetic minority over-sampling technique.” Journal of Artificial Intelligence Research, 16, 321– 357.

[8] Chen, Z., Liao, J., Xu, W., and Yang, Y. (2023). “Multifold cross-validation model averaging for generalized additive partial linear models.” Journal of Computational and Graphical Statistics, 32, 1649–1659.

[9] Chen, Z., Zhang, J., Xu, W., and Yang, Y. (2022). “Consistency of BIC model averaging.” Statistica Sinica, 32, 1–6.

[10] Douzas, G. and Bacao, F. (2017). “Self-organizing map oversampling (SOMO) for imbalanced data set learning.” Expert Systems with Applications, 82, 40–52.

[11] Draper, D. (1995). “Assessment and propagation of model uncertainty.” Journal of the Royal Statistical Society, Series B, 57, 45–97.

[12] Drummond, C., Holte, R.C., et al. (2003). “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling.” In “Workshop on Learning from Imbalanced Datasets II,” volume 11, pages 1–8.

[13] Fang, F., Li, J., and Xia, X. (2022). “Semiparametric model averaging prediction for dichotomous response.” Journal of Econometrics, 229, 219–245.

[14] Grobelnik, M. (1999). “Feature selection for unbalanced class distribution and Naive Bayes.” In “International Conference on Machine Learning,” pages 258–267.

[15] Hansen, B.E. (2007). “Least squares model averaging.” Econometrica, 75, 1175–1189.

[16] Hansen, B.E. (2014). “Model averaging, asymptotic risk, and regressor groups.” Quantitative Economics, 5, 495–530.

[17] He, B., Ma, S., Zhang, X., and Zhu, L. (2023). “Rank-based greedy model averaging for high-dimensional survival data.” Journal of the American Statistical Association, 118, 2658–2670.

[18] Hoeting, J.A., Madigan, D., Raftery, A.E., and Volinsky, C.T. (1999). “Bayesian model averaging: A tutorial.” Statistical Science, 14, 382–401.

[19] Koo, J.Y., Lee, Y., Kim, Y., and Park, C. (2008). “A bahadur representation of the linear support vector machine.” Journal of Machine Learning Research, 9, 1343–1368.

[20] Koziarski, M., Krawczyk, B., and Wo´zniak, M. (2019). “Radial-based oversampling for noisy imbalanced data classification.” Neurocomputing, 343, 19–33.

[21] Kubat, M., Matwin, S., et al. (1997). “Addressing the curse of imbalanced training sets: onesided selection.” In “International Conference on Machine Learning,” pages 179–186.

[22] Liu, M., Xu, C., Luo, Y., Xu, C., Wen, Y., and Tao, D. (2018). “Cost-sensitive feature selection by optimizing F-measures.” IEEE Transactions on Image Processing, 27, 1323–1335.

[23] Liu, X.Y., Wu, J., and Zhou, Z.H. (2008). “Exploratory undersampling for class-imbalance

[24] learning.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39, 539–550.

[25] Lunardon, N., Menardi, G., and Torelli, N. (2014). “ROSE: a package for binary imbalanced learning.” R Journal, 6, 79–89.

[26] Maldonado, S., Weber, R., and Famili, F. (2014). “Feature selection for high-dimensional classimbalanced data sets using support vector machines.” Information Sciences, 286, 228–246.

[27] Mathew, J., Pang, C.K., Luo, M., and Leong, W.H. (2017). “Classification of imbalanced data by oversampling in kernel space of support vector machines.” IEEE Transactions on Neural Networks and Learning Systems, 29, 4065–4076.

[28] Nan, Y. and Yang, Y. (2014). “Variable selection diagnostics measures for high-dimensional regression.” Journal of Computational and Graphical Statistics, 23, 636–656.

[29] Peng, B., Wang, L., and Wu, Y. (2016). “An error bound for L1-norm support vector machine coefficients in ultra-high dimension.” Journal of Machine Learning Research, 17, 8279– 8304.

[30] Puthiya Parambath, S., Usunier, N., and Grandvalet, Y. (2014). “Optimizing F-measures by cost-sensitive classification.” In “Advances in Neural Information Processing Systems,” pages 2123–2131.

[31] Rahman, M.M. and Davis, D.N. (2013). “Addressing the class imbalance problem in medical datasets.” International Journal of Machine Learning and Computing, 3, 224–228.

[32] Veropoulos, K., Campbell, C., Cristianini, N., et al. (1999). “Controlling the sensitivity of support vector machines.” In “International Joint Conference on Artificial Intelligence,” volume 55, pages 55–60.

[33] Wang, H. (2020). “Logistic regression for massive data with rare events.” In “International Conference on Machine Learning,” volume 119, pages 9829–9836.

[34] Yang, C.Y., Yang, J.S., and Wang, J.J. (2009). “Margin calibration in SVM class-imbalanced learning.” Neurocomputing, 73, 397–411.

[35] Yang, Q. and Wu, X. (2006). “10 challenging problems in data mining research.” International Journal of Information Technology & Decision Making, 5, 597–604.

[36] Yang, Y. (2001). “Adaptive regression by mixing.” Journal of the American Statistical Association, 96, 574–588.

[37] Yang, Y. (2006). “Comparing learning methods for classification.” Statistica Sinica, 16, 635– 657.

[38] Yin, L., Ge, Y., Xiao, K., Wang, X., and Quan, X. (2013). “Feature selection for highdimensional imbalanced data.” Neurocomputing, 105, 3–11.

[39] Yuan, Z. and Yang, Y. (2005). “Combining linear regression models: When and how?” Journal of the American Statistical Association, 100, 1202–1214.

[40] Zhang, C., Tan, K.C., Li, H., and Hong, G.S. (2018). “A cost-sensitive deep belief network for imbalanced classification.” IEEE Transactions on Neural Networks and Learning Systems,

[41] 30, 109–122.

[42] Zhang, S. (2020). “Cost-sensitive KNN classification.” Neurocomputing, 391, 234–242.

[43] Zhang, X. and Liu, C.A. (2023). “Model averaging prediction by K-fold cross-validation.” Journal of Econometrics, 235, 280–301.

[44] Zhang, X., Lu, Z., and Zou, G. (2013). “Adaptively combined forecasting for discrete response time series.” Journal of Econometrics, 176, 80–91.

[45] Zhang, X., Wu, Y., Wang, L., and Li, R. (2016a). “A consistent information criterion for support vector machines in diverging model spaces.” Journal of Machine Learning Research, 17, 466–491.

[46] Zhang, X., Wu, Y., Wang, L., and Li, R. (2016b).

[47] “Variable selection for support vector machines in moderately high dimensions.” Journal of the Royal Statistical Society, Series

[48] B, 78, 53–76.

[49] Zhang, X., Yu, D., Zou, G., and Liang, H. (2016c).

[50] “Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models.” Journal of the

[51] American Statistical Association, 111, 1175–1790.

[52] Zhang, X., Zou, G., Liang, H., and Carroll, R.J. (2020). “Parsimonious model averaging with a diverging number of parameters.” Journal of the American Statistical Association, 115, 972–984.

[53] Zhou, Z.H. and Liu, X.Y. (2005). “Training cost-sensitive neural networks with methods address ing the class imbalance problem.” IEEE Transactions on Knowledge and Data Engineering, 18, 63–77.

[54] Zhu, J., Rosset, S., Tibshirani, R., and Hastie, T. (2004). “1-norm support vector machines.” In “Advances in Neural Information Processing Systems,” volume 16, pages 49–56.

[55] Zou, J., Yuan, C., Zhang, X., Zou, G., and Wan, A.T. (2023). “Model averaging for support vector classifier by cross-validation.” Statistics and Computing, 33, 117–139. Ze Chen Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan 250100, China.