Fast Cost-Constrained High Dimensional Regression

Hyeong Jin Hyun and Xiao Wang

doi:10.5705/ss.202024.0331

Abstract

The conventional statistical models assume the availability of covariates

without associated costs, yet real-world scenarios often involve acquisition costs

and budget constraints imposed on these variables.

Scientists must navigate

a trade-off between model accuracy and expenditure within these constraints.

In this paper, we introduce fast cost-constrained regression (FCR), designed to

tackle such problems with computational and statistical efficiency. Specifically,

we develop fast and efficient algorithms to solve cost-constrained problems with

the loss function satisfying a quadratic majorization condition. We theoretically

establish nonasymptotic error bounds for the algorithm’s solution, considering

both estimation and selection accuracy. We apply FCR to extensive numerical

simulations and four datasets from the National Health and Nutrition Examination Survey. Our method outperforms the latest approaches in various per-

formance measures, while requiring fewer iterations and a shorter runtime. The

Key words and phrases: budget, cost constraint, high dimensional regression, nonasymptotic error bound, nonconvex optimization 2 1

Information

Preprint No.	SS-2024-0331
Manuscript ID	SS-2024-0331
Complete Authors	Hyeong Jin Hyun, Xiao Wang
Corresponding Authors	Xiao Wang
Emails	wangxiao@purdue.edu

References

Borchers, H. W. (2022). adagio: Discrete and Global Optimization Routines. R package version 0.8.5.
Chen, Y., Q. Gao, F. Liang, and X. Wang (2020, 09). Nonlinear variable selection via deep neural networks. Journal of Computational and Graphical Statistics 30, 1–23.
Erion, G., J. D. Janizek, C. Hudelson, R. B. Utarnachitt, A. M. McCoy, M. R. Sayre, N. J.
White, and S.-I. Lee (2022). Coai: Cost-aware artificial intelligence for health care. Nature biomedical engineering 6(12), 1384–1398.
Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.
Foucart, S. (2011). Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on numerical analysis 49(6), 2543–2563.
Huang, J., Y. Jiao, Y. Liu, and X. Lu (2018). A constructive approach to l0 penalized regression. The Journal of Machine Learning Research 19(1), 403–439.
Jin, C. (2024). 0-1 knapsack in nearly quadratic time. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pp. 271–282.
Kachuee, M., O. Goldstein, K. Karkkainen, S. Darabi, and M. Sarrafzadeh (2019). Opportunistic learning: Budgeted cost-sensitive learning from data streams. arXiv preprint arXiv:1901.00243.
Martello, S. and P. Toth (1990). Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc.
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM journal on computing 24(2), 227–234.
Nguyen, S., R. Chan, J. Cadena, B. Soper, P. Kiszka, L. Womack, M. Work, J. M. Duggan,
S. T. Haller, J. A. Hanrahan, et al. (2021). Budget constrained machine learning for early prediction of adverse outcomes for covid-19 patients. Scientific Reports 11(1), 19543.
Peter, S., F. Diego, F. A. Hamprecht, and B. Nadler (2017). Cost efficient gradient boosting. Advances in neural information processing systems 30, 1551–1561.
Tay, J. K., B. Narasimhan, and T. Hastie (2023). Elastic net regularization paths for all generalized linear models. Journal of statistical software 106, 1–31.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288.
Wang, X. (2008). Bayesian free-knot monotone cubic spline regression. Journal of Computa42 tional and Graphical Statistics 17(2), 373–387.
Wang, X. and J. Shen (2010, 06). A class of grouped brunk estimators and penalized spline estimators for monotone regression. Biometrika 97(3), 585–601.
Wang, X. and J. Shen (2013). Uniform convergence and rate adaptive estimation of convex functions via constrained optimization. SIAM Journal on Control and Optimization 51(4), 2753–2787.
Wang, X. and M. Woodroofe (2007). A Kiefer–Wolfowitz comparison theorem for Wicksell’s problem. The Annals of Statistics 35(4), 1559 – 1575.
Yang, Y. and H. Zou (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing 25, 1129–1141.
Yu, G., H. Fu, and Y. Liu (2022). High-dimensional cost-constrained regression via nonconvex optimization. Technometrics 64(1), 52–64.
Yu, G., D. Witten, and J. Bien (2022). Controlling costs: Feature selection on a budget. Stat 11(1), e427.
Zhang, C.-H. and J. Huang (2008). The sparsity and bias of the lasso selection in highdimensional linear regression. The Annals of Statistics 36, 1567–1594.
Zhao, P. and B. Yu (2006). On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563.
Zou, H. and T. Hastie (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 67(2), 301–320.

Acknowledgments

The authors thank the Editor, Associate Editor, and anonymous referees

for their valuable comments and suggestions, which greatly improved this

work. This research was supported by the National Science Foundation of

the United States under Grant No. SES-2316428.

Supplementary Materials

The proofs of Theorems 1, 2 and 3 are provided in Section S1. The sufficient conditions for QM and several loss functions that satisfy QM are

presented in Section S2.

The stopping criteria for the FCR and GFCR

algorithms are discussed in S3. Additional details and results of numerical

experiments and NHANES data analysis are given in Sections S5, and S6.

Supplementary materials are available for download.

[1] Borchers, H. W. (2022). adagio: Discrete and Global Optimization Routines. R package version 0.8.5.

[2] Chen, Y., Q. Gao, F. Liang, and X. Wang (2020, 09). Nonlinear variable selection via deep neural networks. Journal of Computational and Graphical Statistics 30, 1–23.

[3] Erion, G., J. D. Janizek, C. Hudelson, R. B. Utarnachitt, A. M. McCoy, M. R. Sayre, N. J.

[4] White, and S.-I. Lee (2022). Coai: Cost-aware artificial intelligence for health care. Nature biomedical engineering 6(12), 1384–1398.

[5] Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.

[6] Foucart, S. (2011). Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on numerical analysis 49(6), 2543–2563.

[7] Huang, J., Y. Jiao, Y. Liu, and X. Lu (2018). A constructive approach to l0 penalized regression. The Journal of Machine Learning Research 19(1), 403–439.

[8] Jin, C. (2024). 0-1 knapsack in nearly quadratic time. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pp. 271–282.

[9] Kachuee, M., O. Goldstein, K. Karkkainen, S. Darabi, and M. Sarrafzadeh (2019). Opportunistic learning: Budgeted cost-sensitive learning from data streams. arXiv preprint arXiv:1901.00243.

[10] Martello, S. and P. Toth (1990). Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc.

[11] Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM journal on computing 24(2), 227–234.

[12] Nguyen, S., R. Chan, J. Cadena, B. Soper, P. Kiszka, L. Womack, M. Work, J. M. Duggan,

[13] S. T. Haller, J. A. Hanrahan, et al. (2021). Budget constrained machine learning for early prediction of adverse outcomes for covid-19 patients. Scientific Reports 11(1), 19543.

[14] Peter, S., F. Diego, F. A. Hamprecht, and B. Nadler (2017). Cost efficient gradient boosting. Advances in neural information processing systems 30, 1551–1561.

[15] Tay, J. K., B. Narasimhan, and T. Hastie (2023). Elastic net regularization paths for all generalized linear models. Journal of statistical software 106, 1–31.

[16] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288.

[17] Wang, X. (2008). Bayesian free-knot monotone cubic spline regression. Journal of Computa42 tional and Graphical Statistics 17(2), 373–387.

[18] Wang, X. and J. Shen (2010, 06). A class of grouped brunk estimators and penalized spline estimators for monotone regression. Biometrika 97(3), 585–601.

[19] Wang, X. and J. Shen (2013). Uniform convergence and rate adaptive estimation of convex functions via constrained optimization. SIAM Journal on Control and Optimization 51(4), 2753–2787.

[20] Wang, X. and M. Woodroofe (2007). A Kiefer–Wolfowitz comparison theorem for Wicksell’s problem. The Annals of Statistics 35(4), 1559 – 1575.

[21] Yang, Y. and H. Zou (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing 25, 1129–1141.

[22] Yu, G., H. Fu, and Y. Liu (2022). High-dimensional cost-constrained regression via nonconvex optimization. Technometrics 64(1), 52–64.

[23] Yu, G., D. Witten, and J. Bien (2022). Controlling costs: Feature selection on a budget. Stat 11(1), e427.

[24] Zhang, C.-H. and J. Huang (2008). The sparsity and bias of the lasso selection in highdimensional linear regression. The Annals of Statistics 36, 1567–1594.

[25] Zhao, P. and B. Yu (2006). On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563.

[26] Zou, H. and T. Hastie (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 67(2), 301–320.