Abstract
Statistical inference in parametric models (e.g., the Bradley–Terry
model and its variants) for paired-comparison data has been explored in the
high-dimensional regime, in which the number of items involving in paired comparisons diverges. However, parametric models are highly susceptible to model
misspecification. To relax the assumption of known distributions and provide
flexibility, we propose a semiparametric framework for modeling the merits of
items and covariate effects (e.g., home-field advantage) by introducing latent
random variables with unspecified distributions. As the number of parameters
increases with the number of items, semiparametric inference is highly nontrivial. To address this issue, we employ a kernel-based least squares approach to
estimate all unknown parameters. When each pair of items has a fixed number of
comparisons and the number of items tends to infinity, we prove the consistency
of all resulting estimators and derive their asymptotic normal distributions. To
the best of our knowledge, this is the first study to conduct a semiparametric
analysis of paired comparisons with an increasing dimension. We conduct simulations to evaluate the finite-sample performance of the proposed method and
illustrate its practical utility by analyzing an NBA dataset.
Key words and phrases: Asymptotic normality, Consistency, Covariate effects, Paired comparison, Semiparametric model
Information
| Preprint No. | SS-2025-0318 |
|---|---|
| Manuscript ID | SS-2025-0318 |
| Complete Authors | Haoyue Song, Lianqiang Qu, Ting Yan, Yuguo Chen |
| Corresponding Authors | Ting Yan |
| Emails | tingyanty@mail.ccnu.edu.cn |
References
- Agresti, A. (2012). Categorical Data Analysis, 3rd Edition. Wiley, New York.
- Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324– 345.
- Candelaria, L. E. (2020). A semiparametric network formation model with unobserved linear heterogeneity. arXiv preprint arXiv:2007.05403.
- Cattelan, M., Varin, C., and Firth, D. (2013). Dynamic Bradley–Terry modelling of sports tournaments. Journal of the Royal Statistical Society Series C: Applied Statistics, 62(1):135–150.
- Chen, B., Escalera, S., Guyon, I., Ponce-L´opez, V., Shah, N., and Oliu Sim´on, M. (2016). Overcoming calibration problems in pattern labeling with pairwise ratings: application to personality traits. In European Conference on Computer Vision, pages 419–432.
- Chen, P., Gao, C., and Zhang, A. Y. (2022). Optimal full ranking from pairwise comparisons. The Annals of Statistics, 50(3):1775–1805.
- Chen, Y., Fan, J., Ma, C., and Wang, K. (2019). Spectral method and regularized MLE are both optimal for top-K ranking. Annals of Statistics, 47(4):2204–2235.
- David, H. A. (1988). The Method of Paired Comparisons. 2nd Edition. Oxford University Press, Oxford.
- Dong, P., Han, R., Jiang, B., and Xu, Y. (2024). Statistical ranking with dynamic covariates. arXiv preprint arXiv:2406.16507.
- Dzemski, A. (2019). An empirical model of dyadic link formation in a network with unobserved heterogeneity. Review of Economics and Statistics, 101(5):763–776.
- Esteves, P. T., Mikolajec, K., Schelling, X., and Sampaio, J. (2021). Basketball performance is affected by the schedule congestion: NBA backto-backs under the microscope. European Journal of Sport Science, 21(1):26–35.
- Fan, J., Hou, J., and Yu, M. (2024). Uncertainty quantification of MLE for entity ranking with covariates. Journal of Machine Learning Research, 25(358):1–83.
- Fan, J., Lou, Z., Wang, W., and Yu, M. (2025). Ranking inferences based on the top choice of multiway comparisons. Journal of the American Statistical Association, 120(549):237–250.
- Han, R., Tang, W., and Xu, Y. (2024). Statistical inference for pairwise comparison models. arXiv preprint arXiv:2401.08463.
- Han, R. and Xu, Y. (2025). A unified analysis of likelihood-based estimators in the plackett–luce model. The Annals of Statistics, 53(5):2077–2102.
- Han, R., Xu, Y., and Chen, K. (2023). A general pairwise comparison model for extremely sparse networks. Journal of the American Statistical Association, 118(544):2422–2432.
- Han, R., Ye, R., Tan, C., and Chen, K. (2020). Asymptotic theory of sparse Bradley-Terry model. Annals of Applied Probability, 30(5):2491–2515.
- H¨ardle, W. (1990). Applied Nonparametric Regression. Cambridge University Press.
- Hunter, D. R. (2003). MM algorithms for generalized Bradley-Terry models. Annals of Statistics, 32(1):384–406.
- Lewbel, A. (1998). Semiparametric latent variable model estimation with endogenous or mismeasured regressors. Econometrica, 66(1):105–121.
- Lewbel, A. (2000). Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables. Journal of econometrics, 97(1):145–177.
- Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator. Journal of econometrics, 27(3):313–333.
- Masarotto, G. and Varin, C. (2012). The ranking lasso and its application to sport tournaments. The Annals of Applied Statistics, 6(4):1949–1970.
- Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and Its Applications, 9(1):157–159.
- Qu, L., Chen, L., Yan, T., and Chen, Y. (2026). Inference in semiparametric formation models for directed networks. Journal of Business and Economic Statistics, 44(1):188–202.
- Radlinski, F. and Joachims, T. (2007). Active exploration for learning rankings from clickthrough data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, pages 570–579, New York. Association for Computing Machinery.
- Reilly, T., Atkinson, G., and Budgett, R. (2001). Effect of low-dose temazepam on physiological variables and performance tests following a westerly flight across five time zones. International journal of sports medicine, 22(03):166–174.
- Simons, G. and Yao, Y.-C. (1999). Asymptotics when the number of parameters tends to infinity in the Bradley-Terry model for paired comparisons. The Annals of Statistics, 27(3):1041–1060.
- Singh, R., Iliopoulos, G., and Davidov, O. (2025). Least squares for cardinal paired comparisons data. Journal of the Royal Statistical Society Series B: Statistical Methodology, page To appear.
- Stigler, S. M. (1994). Citation patterns in the journals of statistics and probability. Statistical Science, 9(1):94–108.
- Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4):273–286.
- Tutz, G. and Schauberger, G. (2015). Extended ordered paired comparison models with application to football data from German Bundesliga. AStA Advances in Statistical Analysis, 99:209–227.
- Varin, C., Cattelan, M., and Firth, D. (2016). Statistical modelling of citation exchange between statistics journals. Journal of The Royal Statistical Society Series A-statistics in Society, 179(1):1–63.
- Watson, G. S. (1964). Smooth regression analysis. Sankhy¯a: The Indian Journal of Statistics, Series A, 26(4):359–372.
- Yan, T. (2025). Inference in a generalized Bradley-Terry model for paired comparisons with covariates and a growing number of subjects. arXiv:2507.22472.
- Yan, T., Jiang, B., Fienberg, S. E., and Leng, C. (2019). Statistical inference in a directed network model with covariates. Journal of the American Statistical Association, 114(526):857–868.
- Yan, T., Li, Y., Xu, J., Yang, Y., and Zhu, J. (2025). Likelihood ratio tests in random graph models with increasing dimensions. Journal of the American Statistical Association, (just-accepted):1–26.
- Yan, T., Yang, Y., and Xu, J. (2012). Sparse paired comparisons in the Bradley-Terry model. Statistica Sinica, 22(3):1305–1318.
- Zeleneev, A. (2020). Identification and estimation of network models with nonparametric unobserved heterogeneity. Working paper.
Acknowledgments
We are very grateful to three referees, the associated editor, and the editor
for their valuable comments that have greatly improved the manuscript.
Yan is supported by the National Natural Science Foundation of China
(No. 12171188, 12322114).
Supplementary Materials
.
5.
Summary
We have proposed a semiparametric paired comparison model that incorporates covariates. By introducing a special regressor, we developed a kernel-
based least squares method to estimate all unknown parameters in the