Abstract
The conventional statistical models assume the availability of covariates
without associated costs, yet real-world scenarios often involve acquisition costs
and budget constraints imposed on these variables.
Scientists must navigate
a trade-off between model accuracy and expenditure within these constraints.
In this paper, we introduce fast cost-constrained regression (FCR), designed to
tackle such problems with computational and statistical efficiency. Specifically,
we develop fast and efficient algorithms to solve cost-constrained problems with
the loss function satisfying a quadratic majorization condition. We theoretically
establish nonasymptotic error bounds for the algorithm’s solution, considering
both estimation and selection accuracy. We apply FCR to extensive numerical
simulations and four datasets from the National Health and Nutrition Examination Survey. Our method outperforms the latest approaches in various per-
formance measures, while requiring fewer iterations and a shorter runtime. The
Information
| Preprint No. | SS-2024-0331 |
|---|---|
| Manuscript ID | SS-2024-0331 |
| Complete Authors | Hyeong Jin Hyun, Xiao Wang |
| Corresponding Authors | Xiao Wang |
| Emails | wangxiao@purdue.edu |
References
- Borchers, H. W. (2022). adagio: Discrete and Global Optimization Routines. R package version 0.8.5.
- Chen, Y., Q. Gao, F. Liang, and X. Wang (2020, 09). Nonlinear variable selection via deep neural networks. Journal of Computational and Graphical Statistics 30, 1–23.
- Erion, G., J. D. Janizek, C. Hudelson, R. B. Utarnachitt, A. M. McCoy, M. R. Sayre, N. J.
- White, and S.-I. Lee (2022). Coai: Cost-aware artificial intelligence for health care. Nature biomedical engineering 6(12), 1384–1398.
- Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.
- Foucart, S. (2011). Hard thresholding pursuit: an algorithm for compressive sensing. SIAM Journal on numerical analysis 49(6), 2543–2563.
- Huang, J., Y. Jiao, Y. Liu, and X. Lu (2018). A constructive approach to l0 penalized regression. The Journal of Machine Learning Research 19(1), 403–439.
- Jin, C. (2024). 0-1 knapsack in nearly quadratic time. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pp. 271–282.
- Kachuee, M., O. Goldstein, K. Karkkainen, S. Darabi, and M. Sarrafzadeh (2019). Opportunistic learning: Budgeted cost-sensitive learning from data streams. arXiv preprint arXiv:1901.00243.
- Martello, S. and P. Toth (1990). Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc.
- Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM journal on computing 24(2), 227–234.
- Nguyen, S., R. Chan, J. Cadena, B. Soper, P. Kiszka, L. Womack, M. Work, J. M. Duggan,
- S. T. Haller, J. A. Hanrahan, et al. (2021). Budget constrained machine learning for early prediction of adverse outcomes for covid-19 patients. Scientific Reports 11(1), 19543.
- Peter, S., F. Diego, F. A. Hamprecht, and B. Nadler (2017). Cost efficient gradient boosting. Advances in neural information processing systems 30, 1551–1561.
- Tay, J. K., B. Narasimhan, and T. Hastie (2023). Elastic net regularization paths for all generalized linear models. Journal of statistical software 106, 1–31.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58(1), 267–288.
- Wang, X. (2008). Bayesian free-knot monotone cubic spline regression. Journal of Computa42 tional and Graphical Statistics 17(2), 373–387.
- Wang, X. and J. Shen (2010, 06). A class of grouped brunk estimators and penalized spline estimators for monotone regression. Biometrika 97(3), 585–601.
- Wang, X. and J. Shen (2013). Uniform convergence and rate adaptive estimation of convex functions via constrained optimization. SIAM Journal on Control and Optimization 51(4), 2753–2787.
- Wang, X. and M. Woodroofe (2007). A Kiefer–Wolfowitz comparison theorem for Wicksell’s problem. The Annals of Statistics 35(4), 1559 – 1575.
- Yang, Y. and H. Zou (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing 25, 1129–1141.
- Yu, G., H. Fu, and Y. Liu (2022). High-dimensional cost-constrained regression via nonconvex optimization. Technometrics 64(1), 52–64.
- Yu, G., D. Witten, and J. Bien (2022). Controlling costs: Feature selection on a budget. Stat 11(1), e427.
- Zhang, C.-H. and J. Huang (2008). The sparsity and bias of the lasso selection in highdimensional linear regression. The Annals of Statistics 36, 1567–1594.
- Zhao, P. and B. Yu (2006). On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563.
- Zou, H. and T. Hastie (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 67(2), 301–320.
Acknowledgments
The authors thank the Editor, Associate Editor, and anonymous referees
for their valuable comments and suggestions, which greatly improved this
work. This research was supported by the National Science Foundation of
the United States under Grant No. SES-2316428.
Supplementary Materials
The proofs of Theorems 1, 2 and 3 are provided in Section S1. The sufficient conditions for QM and several loss functions that satisfy QM are
presented in Section S2.
The stopping criteria for the FCR and GFCR
algorithms are discussed in S3. Additional details and results of numerical
experiments and NHANES data analysis are given in Sections S5, and S6.