Abstract
We consider two generalizations of the area under the receiver operating char
acteristic curve (“AUC”), a popular measure of discrimination, to accommodate clustered
data. We describe situations in which the two cluster AUCs diverge and other situations
in which they coincide. Differences are described using concrete models and visualizations,
while quantitative results are used to relate the two generalizations. Procedures for joint
estimation and inference are also presented, along with a simulation study. We apply the
results to data collected on urban policing behavior.
Information
| Preprint No. | SS-2024-0100 |
|---|---|
| Manuscript ID | SS-2024-0100 |
| Complete Authors | Haben Michael, Lu Tian |
| Corresponding Authors | Haben Michael |
| Emails | hmichael@math.umass.edu |
References
- Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of mathematical psychology, 12(4):387– 415.
- Benhin, E., Rao, J., and Scott, A. (2005). Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes. Biometrika, 92(2):435–450.
- Bugni, F., Canay, I., Shaikh, A., and Tabord-Meehan, M. (2022). Inference for cluster randomized experiments with non-ignorable cluster sizes. arXiv preprint ArXiv:2204.08356.
- Dorfman, D. D. and Alf Jr, E. (1969). Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—rating-method data. Journal of mathematical psychology, 6(3):487–496.
- Emir, B., Wieand, S., Jung, S.-H., and Ying, Z. (2000). Comparison of diagnostic markers with repeated measurements: a non-parametric ROC curve approach. Statistics in Medicine, 19(4):511–523.
- Goel, S., Rao, J. M., and Shroff, R. (2016). Precinct or prejudice? Understanding racial disparities in New York City’s stop-and-frisk policy. The Annals of Applied Statistics, 10(1):365–394.
- Hanley, J. A. (1988). The robustness of the “binormal” assumptions used in fitting ROC curves. Medical decision making, 8(3):197–203.
- Lee, A. J. (2019). U-statistics: Theory and Practice. Routledge.
- Lee, M.-L. T. and Dehling, H. G. (2005). Generalized two-sample U-statistics for clustered data. Statistica Neerlandica, 59(3):313–323.
- Lindley, D. V. and Novick, M. R. (1981). The role of exchangeability in inference. The annals of statistics, pages 45–58. p
- Liu, H., Li, G., Cumberland, W. G., Wu, T., et al. (2005). Testing statistical significance of the area under a receiving operating characteristics curve for repeated measures design with bootstrapping. Journal of Data Science, 3(3):257–278.
- Michael, H., Tian, L., and Ghebremichael, M. (2019). The ROC curve for regularly measured longitudinal biomarkers. Biostatistics, 20(3):433–451.
- Obuchowski, N. A. (1997). Nonparametric analysis of clustered ROC curve data. Biometrics, 53:567–578.
- Pearl, J. (2014). Comment: Understanding Simpson’s paradox. The American Statistician, 68(1):8–13.
- Ridgeway, G. (2006). Assessing the effect of race bias in post-traffic stop outcomes using propensity scores. Journal of Quantitative Criminology, 22(1):1–29.
- Ridgeway, G. and MacDonald, J. M. (2009). Doubly robust internal benchmarking and false discovery rates for detecting racial bias in police stops. Journal of the American Statistical Association, 104(486):661–668.
- Rosner, B. and Grove, D. (1999). Use of the Mann–Whitney U-test for clustered data. Statistics in medicine, 18(11):1387–1400.
- Sen, P. K. (1960). On Some Convergence Properties of U-tatistics. Calcutta Statistical Association Bulletin, 10(1-2):1–18.
- Toledano, A. Y. (2003). Three methods for analysing correlated ROC curves: a comparison in real data sets from multi-reader, multi-case studies with a factorial design. Statistics in medicine, 22(18):2919–2933.
- Wu, Y. and Wang, X. (2011). Optimal weight in estimating and comparing areas under the receiver operating characteristic curve using longitudinal data. Biometrical journal, 53(5):764–778. Haben Michael
Acknowledgments
The authors wish to thank Prof. Maria Cuellar for helpful consultation regarding the data
analysis, and an anonymous reviewer for contributing the substance of Prop. 1(2).
p