Abstract
Multiple biomarkers are often combined for more accurate disease diagnosis. For
this purpose, one popular performance metric is the area under the receiving operating characteristic (ROC) curve (AUC). Optimizing the empirical AUC over
linear combinations of biomarkers, however, faces two primary challenges. First,
AUC is scale-invariant to the linear combinations, creating difficulties in both
the computation and asymptotic study. Most available approaches actually consider a restricted problem by setting one coefficient to a constant. Second, the
empirical AUC is piecewise-constant and standard gradient-based computational
algorithms are not applicable. Existing methods maximize kernel-smoothed AUC
instead, but they can be sensitive to bandwidth choice. In this article, we tackle
these challenges by developing a new empirical AUC maximization method. Computationally efficient algorithms are provided for both the point and variance
estimation of the estimated combination coefficients. Simulation studies show
good computational and statistical performance of the proposed methods. An
illustration is provided with a clinical application.
Information
| Preprint No. | SS-2024-0195 |
|---|---|
| Manuscript ID | SS-2024-0195 |
| Complete Authors | Yuxuan Chen, Yijian Huang |
| Corresponding Authors | Yuxuan Chen |
| Emails | yuxuan.chen@emory.edu |
References
- B´elisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on Rd. Journal of Applied Probability 29(4), 885–895.
- Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applied Mathematics 6(1), 76–90.
- Chen, X., A. Vexler, and M. Markatou (2015). Empirical likelihood ratio confidence interval estimation of best linear combinations of biomarkers. Computational Statistics & Data Analysis 82, 186–198.
- Fletcher, R. (1970). A new approach to variable metric algorithms. The computer journal 13(3), 317–322.
- Fong, Y., S. Yin, and Y. Huang (2016). Combining biomarkers linearly and nonlinearly for classification using the area under the roc curve. Statistics in medicine 35(21), 3792–3809.
- Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of computation 24(109), 23–26.
- Han, A. K. (1987). Non-parametric analysis of a generalized regression model: the maximum rank correlation estimator. Journal of Econometrics 35(2-3), 303–316.
- Huang, X., G. Qin, and Y. Fang (2011). Optimal combinations of diagnostic tests based on auc. Biometrics 67(2), 568–576.
- Huang, Y. and M. G. Sanda (2022). Linear biomarker combination for constrained classification. Ann. Statist. 50(5), 2793–2815.
- Khan, S. and E. Tamer (2007). Partial rank estimation of duration models with general forms of censoring. Journal of Econometrics 136(1), 251–280.
- Lin, H., L. Zhou, H. Peng, and X.-H. Zhou (2011). Selection and combination of biomarkers using roc method for disease classification and prediction. Canadian Journal of Statistics 39(2), 324–343.
- Ma, S. and J. Huang (2007). Combining multiple markers for classification using roc. Biometrics 63(3), 751–757.
- Meisner, A., C. R. Parikh, and K. F. Kerr (2019). Biomarker combinations for diagnosis and prognosis in multicenter studies: Principles and methods. Statistical methods in medical research 28(4), 969–985.
- Pepe, M. S., T. Cai, and G. Longton (2006). Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62(1), 221–229.
- Pepe, M. S. and M. L. Thompson (2000). Combining diagnostic test results to increase accuracy. Biostatistics 1(2), 123–140.
- Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization. Mathematics of computation 24(111), 647–656.
- Sherman, R. P. (1993). The limiting distribution of the maximum rank correlation estimator. Econometrica: Journal of the Econometric Society 61(1), 123–137.
- Smith, J. W., J. E. Everhart, W. Dickson, W. C. Knowler, and R. S. Johannes (1988). Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, pp. 261. American Medical Informatics Association.
- Vexler, A., A. Liu, E. F. Schisterman, and C. Wu (2006). Note on distribution-free estimation of maximum linear separation of two multivariate distributions. Nonparametric Statistics 18(2), 145–158.
- Yuille, A. L. and A. Rangarajan (2003). The concave-convex procedure. Neural computation 15(4), 915–936.
- Zhang, J., Z. Jin, Y. Shao, and Z. Ying (2018). Statistical inference on transformation models: a self-induced smoothing approach. Journal of Nonparametric Statistics 30(2), 308–331.
Acknowledgments
We sincerely thank the reviewers for their thoughtful comments and constructive suggestions, which have helped to enhance the quality of this
manuscript. The authors were supported in part by NIH grants R01 CA230268
and P30 AI050409.
Supplementary Materials
The online Supplementary Material contains additional simulation results.