Tests on Dynamic Ranking

Nan Lu, Jian Shi, Xin-Yu Tian and Kai Song

doi:10.5705/ss.202024.0153

Abstract

In this paper, we investigate the dynamic Bradley-Terry model, a highly ac

claimed statistical ranking model, and tackle several crucial inference problems related to

score functions and rank properties. Specifically, we tackle the test problems of score function variation and pairwise similarity, providing valuable insights for model determination

and simplification. We derive asymptotic null distributions for the proposed test statistics

and prove the tests’ consistency. Furthermore, we introduce a novel confidence band of

dynamic rank and establish an innovative and generally applicable test framework for dynamic ranking properties. To overcome the conservativeness issue brought by the supreme

form statistics, we introduce a novel approach based on signed score difference statistics

for ranking inferences. We present theoretical guarantees for the proposed scheme. Numerical simulations validate the theories and demonstrate the satisfactory performance

of our methods. The proposed methods are applied to a real dataset, yielding insightful

results.

Key words and phrases: Bradley-Terry model, Combinatorial inference, Dynamic ranking property, Time-varying test, Uncertainty quantification

Information

Preprint No.	SS-2024-0153
Manuscript ID	SS-2024-0153
Complete Authors	Nan Lu, Jian Shi, Xin-Yu Tian, Kai Song
Corresponding Authors	Jian Shi
Emails	jshi@iss.ac.cn

References

Bazylik, S., M. Mogstad, J. Romano, A. Shaikh, and D. Wilhelm (2024). Finite-and large-sample inference for ranks using multinomial data with an application to ranking political parties. arXiv preprint arXiv:2402.00192.
Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29(4), 1165–1188.
Bong, H., W. Li, S. Shrotriya, and A. Rinaldo (2020). Nonparametric estimation in the dynamic BradleyTerry model. In International Conference on Artificial Intelligence and Statistics, pp. 3317–3326. PMLR.
Bradley, R. A. and M. E. Terry (1952). Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39(3/4), 324–345.
Cambazoglu, B. B., H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt (2010). Early exit optimizations for additive machine learned ranking systems. In Proceedings of the third ACM international conference on Web search and data mining, pp. 411–420.
Chen, Y., J. Fan, C. Ma, and K. Wang (2019). Spectral method and regularized MLE are both optimal for top-K ranking. Annals of Statistics 47(4), 2204–2235.
Chernozhukov, V., D. Chetverikov, and K. Kato (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Annals of Statistics 41, 2786–2819.
de Jong, P. (1987). A central limit theorem for generalized quadratic forms. Probability Theory and Related Fields 75, 261–277.
Duh, K. and K. Kirchhoff (2008). Learning to rank with partially-labeled data. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 251–258.
Elo, A. E. (1967). The proposed uscf rating system, its development, theory, and applications. Chess life 22(8), 242–247.
Fan, J., Z. Lou, W. Wang, and M. Yu (2025). Ranking inferences based on the top choice of multiway comparisons. Journal of the American Statistical Association 120(549), 237–250.
Gao, C., Y. Shen, and A. Y. Zhang (2023). Uncertainty quantification in the Bradley-Terry-Luce model. Information and Inference: A Journal of the IMA 12(2), 1073–1140.
Glickman, M. E. (1999). Parameter estimation in large dynamic paired comparison experiments. Journal of the Royal Statistical Society Series C: Applied Statistics 48(3), 377–394.
Glickman, M. E. (2001). Dynamic paired comparison models with stochastic variances. Journal of Applied Statistics 28(6), 673–689.
Han, R., R. Ye, C. Tan, and K. Chen (2020). Asymptotic theory of sparse Bradley–Terry model. The Annals of Applied Probability 30(5), 2491–2515.
Huang, T.-K., R. C. Weng, and C.-J. Lin (2006). Generalized Bradley-Terry models and multi-class probability estimates. Journal of Machine Learning Research 7(4), 85–115.
Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. Annals of Statistics 32(1), 384–406.
Karl´e, E. and H. Tyagi (2023). Dynamic ranking with the btl model: a nearest neighbor based rank centrality method. Journal of Machine Learning Research 24(269), 1–57.
Liu, Y., E. X. Fang, and J. Lu (2023). Lagrangian inference for ranking problems. Operations Research 71(1), 202–223.
Lv, Y., T. Moon, P. Kolari, Z. Zheng, X. Wang, and Y. Chang (2011). Learning to model relatedness for news recommendation. In Proceedings of the 20th international conference on World wide web, pp. 57–66.
Maystre, L., V. Kristof, and M. Grossglauser (2019). Pairwise comparisons with flexible time-dynamics. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1236–1246.
McHale, I. and A. Morton (2011). A Bradley-Terry type model for forecasting tennis match results. International Journal of Forecasting 27(2), 619–630.
Mogstad, M., J. P. Romano, A. M. Shaikh, and D. Wilhelm (2023, 01). Inference for ranks with applications to mobility across neighbourhoods and academic achievement across countries. The Review of Economic Studies 91(1), 476–518.
Negahban, S., S. Oh, and D. Shah (2017). Rank centrality: Ranking from pairwise comparisons. Operations Research 65, 266–287.
Simons, G. and Y.-C. Yao (1999). Asymptotics when the number of parameters tends to infinity in the Bradley-Terry model for paired comparisons. Annals of Statistics 27(3), 1041–1060.
Tian, X., J. Shi, X. Shen, and K. Song (2024). A spectral approach for the dynamic bradley–terry model. Stat 13(3), e722.
Wang, Y., L. Wang, Y. Li, D. He, and T.-Y. Liu (2013). A theoretical analysis of NDCG type ranking measures. In Conference on learning theory, pp. 25–54. PMLR.
Yan, T., Y. Yang, and J. Xu (2012). Sparse paired comparisons in the Bradley-Terry model. Statistica Sinica 22(3), 1305–1318. Nan Lu State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China

Acknowledgments

We thank the editor, associate editor, and three reviewers for their valuable comments that substantially improved the manuscript.

Supplementary Materials

The online Supplementary Material contains additional simulation results and

proofs.

Supplementary materials are available for download.

[1] Bazylik, S., M. Mogstad, J. Romano, A. Shaikh, and D. Wilhelm (2024). Finite-and large-sample inference for ranks using multinomial data with an application to ranking political parties. arXiv preprint arXiv:2402.00192.

[2] Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29(4), 1165–1188.

[3] Bong, H., W. Li, S. Shrotriya, and A. Rinaldo (2020). Nonparametric estimation in the dynamic BradleyTerry model. In International Conference on Artificial Intelligence and Statistics, pp. 3317–3326. PMLR.

[4] Bradley, R. A. and M. E. Terry (1952). Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39(3/4), 324–345.

[5] Cambazoglu, B. B., H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt (2010). Early exit optimizations for additive machine learned ranking systems. In Proceedings of the third ACM international conference on Web search and data mining, pp. 411–420.

[6] Chen, Y., J. Fan, C. Ma, and K. Wang (2019). Spectral method and regularized MLE are both optimal for top-K ranking. Annals of Statistics 47(4), 2204–2235.

[7] Chernozhukov, V., D. Chetverikov, and K. Kato (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Annals of Statistics 41, 2786–2819.

[8] de Jong, P. (1987). A central limit theorem for generalized quadratic forms. Probability Theory and Related Fields 75, 261–277.

[9] Duh, K. and K. Kirchhoff (2008). Learning to rank with partially-labeled data. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 251–258.

[10] Elo, A. E. (1967). The proposed uscf rating system, its development, theory, and applications. Chess life 22(8), 242–247.

[11] Fan, J., Z. Lou, W. Wang, and M. Yu (2025). Ranking inferences based on the top choice of multiway comparisons. Journal of the American Statistical Association 120(549), 237–250.

[12] Gao, C., Y. Shen, and A. Y. Zhang (2023). Uncertainty quantification in the Bradley-Terry-Luce model. Information and Inference: A Journal of the IMA 12(2), 1073–1140.

[13] Glickman, M. E. (1999). Parameter estimation in large dynamic paired comparison experiments. Journal of the Royal Statistical Society Series C: Applied Statistics 48(3), 377–394.

[14] Glickman, M. E. (2001). Dynamic paired comparison models with stochastic variances. Journal of Applied Statistics 28(6), 673–689.

[15] Han, R., R. Ye, C. Tan, and K. Chen (2020). Asymptotic theory of sparse Bradley–Terry model. The Annals of Applied Probability 30(5), 2491–2515.

[16] Huang, T.-K., R. C. Weng, and C.-J. Lin (2006). Generalized Bradley-Terry models and multi-class probability estimates. Journal of Machine Learning Research 7(4), 85–115.

[17] Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. Annals of Statistics 32(1), 384–406.

[18] Karl´e, E. and H. Tyagi (2023). Dynamic ranking with the btl model: a nearest neighbor based rank centrality method. Journal of Machine Learning Research 24(269), 1–57.

[19] Liu, Y., E. X. Fang, and J. Lu (2023). Lagrangian inference for ranking problems. Operations Research 71(1), 202–223.

[20] Lv, Y., T. Moon, P. Kolari, Z. Zheng, X. Wang, and Y. Chang (2011). Learning to model relatedness for news recommendation. In Proceedings of the 20th international conference on World wide web, pp. 57–66.

[21] Maystre, L., V. Kristof, and M. Grossglauser (2019). Pairwise comparisons with flexible time-dynamics. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1236–1246.

[22] McHale, I. and A. Morton (2011). A Bradley-Terry type model for forecasting tennis match results. International Journal of Forecasting 27(2), 619–630.

[23] Mogstad, M., J. P. Romano, A. M. Shaikh, and D. Wilhelm (2023, 01). Inference for ranks with applications to mobility across neighbourhoods and academic achievement across countries. The Review of Economic Studies 91(1), 476–518.

[24] Negahban, S., S. Oh, and D. Shah (2017). Rank centrality: Ranking from pairwise comparisons. Operations Research 65, 266–287.

[25] Simons, G. and Y.-C. Yao (1999). Asymptotics when the number of parameters tends to infinity in the Bradley-Terry model for paired comparisons. Annals of Statistics 27(3), 1041–1060.

[26] Tian, X., J. Shi, X. Shen, and K. Song (2024). A spectral approach for the dynamic bradley–terry model. Stat 13(3), e722.

[27] Wang, Y., L. Wang, Y. Li, D. He, and T.-Y. Liu (2013). A theoretical analysis of NDCG type ranking measures. In Conference on learning theory, pp. 25–54. PMLR.

[28] Yan, T., Y. Yang, and J. Xu (2012). Sparse paired comparisons in the Bradley-Terry model. Statistica Sinica 22(3), 1305–1318. Nan Lu State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China