Collaborative Analysis for Paired A/B Testing Experiments

Qiong Zhang, Lulu Kang and Xinwei Deng

doi:10.5705/ss.202024.0227

Abstract

With the extensive use of digital devices, online experimental platforms are com

monly used to conduct experiments to collect data for evaluating different variations of products, algorithms, and interface designs, a.k.a., A/B tests. In practice, multiple A/B testing

experiments are often carried out based on a common user population on the same platform.

The same user’s responses to different experiments can be correlated to some extent due to

the individual effect of the user. In this paper, we propose a novel framework that collaboratively analyzes the data from paired A/B tests, namely, a pair of A/B testing experiments

conducted on the same set of experimental subjects. The proposed analysis approach for

paired A/B tests can lead to more accurate estimates than the traditional separate analysis

of each experiment. We obtain the asymptotic distribution of the proposed estimators and

demonstrate that the proposed estimators are asymptotically the best linear unbiased estimators under certain assumptions. Moreover, the proposed analysis approach is computationally

efficient, easy to implement, and robust to different types of responses. Both numerical simulations and numerical studies based on a real case are used to examine the performance of

the proposed method.

Key words and phrases: Design and analysis of experiments; Best unbiased linear estimator; Online controlled experiments; Mixed effect models

Information

Preprint No.	SS-2024-0227
Manuscript ID	SS-2024-0227
Complete Authors	Qiong Zhang, Lulu Kang, Xinwei Deng
Corresponding Authors	Xinwei Deng
Emails	xdeng@vt.edu

References

Bates, D., M¨achler, M., Bolker, B., and Walker, S. (2015), “Fitting Linear MixedEffects Models Using lme4,” Journal of Statistical Software, 67, 1–48.
Chen, H., Ding, P., Geng, Z., and Zhou, X.-H. (2015), “Semiparametric Inference of the Complier Average Causal Effect with Nonignorable Missing Outcomes,” ACM Transactions on Intelligent Systems and Technology (TIST), 7, 1–15.
Chen, X., Kang, X., Jin, R., and Deng, X. (2023), “Bayesian Sparse regression for mixed multi-responses with application to runtime metrics prediction in fog manufacturing,” Technometrics, 65, 206–219.
Deng, A., Xu, Y., Kohavi, R., and Walker, T. (2013), “Improving the sensitivity of online controlled experiments by utilizing pre-experiment data,” in Proceedings of the sixth ACM international conference on Web search and data mining, pp. 123–132.
Ding, P. (2024), A first course in causal inference, CRC Press.
Freedman, D. A. (2008), “On regression adjustments to experimental data,” Advances in Applied Mathematics, 40, 180–193.
Ga lecki, A. and Burzykowski, T. (2012), “Linear mixed-effects model,” in Linear mixed-effects models using R: a step-by-step approach, Springer, pp. 245–273.
Gupta, S., Kohavi, R., Tang, D., Xu, Y., Andersen, R., Bakshy, E., Cardin, N., Chandran, S., Chen, N., Coey, D., et al. (2019), “Top challenges from the first practical online controlled experiments summit,” ACM SIGKDD Explorations Newsletter, 21, 20–35.
Heffernan, N. T. and Heffernan, C. L. (2014), “The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching,” International Journal of Artificial Intelligence in Education, 24, 470–497.
Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, Cambridge University Press.
Jin, Y. and Ba, S. (2023), “Toward optimal variance reduction in online controlled experiments,” Technometrics, 65, 231–242.
Kohavi, R., Tang, D., and Xu, Y. (2020), Trustworthy online controlled experiments: A practical guide to a/b testing, Cambridge University Press.
Larsen, N., Stallrich, J., Sengupta, S., Deng, A., Kohavi, R., and Stevens, N. T.
(2024), “Statistical challenges in online controlled experiments: A review of a/b testing methodology,” The American Statistician, 78, 135–149.
Li, Y., Kang, L., and Huang, X. (2021), “Covariate balancing based on kernel density estimates for controlled experiments,” Statistical Theory and Related Fields, 5, 102–113.
Li, Y., Zhang, Q., Khademi, A., and Yang, B. (2023), “Optimal Design of Controlled Experiments for Personalized Decision Making in the Presence of Observational Covariates,” The New England Journal of Statistics in Data Science, 1, 386–393. Nassi, T. and Jewkes, H.
(2021), “Simultaneous Experimentation: Run Multiple A/B Tests Concurrently,” https://www.split.io/blog/ simultaneous-experiments/.
Neyman, J. (1923), “On the application of probability theory to agricultural experiments. Essay on principles,” Annals of Agricultural Sciences, 1–51.
Poyarkov, A., Drutsa, A., Khalyavin, A., Gusev, G., and Serdyukov, P. (2016), “Boosted decision tree regression adjustment for variance reduction in online controlled experiments,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 235–244.
Rubin, D. B. (1974), “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of educational Psychology, 66, 688.
Selent, D., Patikorn, T., and Heffernan, N. (2016), “Assistments dataset from multiple randomized controlled experiments,” in Proceedings of the Third (2016) ACM Conference on Learning@ Scale, pp. 181–184.
Shen, S., Mao, H., Zhang, Z., Chen, Z., Nie, K., and Deng, X. (2023), “ClusteringBased Imputation for Dropout Buyers in Large-Scale Online Experimentation,” The New England Journal of Statistics in Data Science, 1, 415–425.
Syrgkanis, V., Lei, V., Oprescu, M., Hei, M., Battocchi, K., and Lewis, G. (2019), “Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments,” in Advances in Neural Information Processing Systems 32 (NeurIPS 2019),
Vancouver, Canada: Curran Associates, Inc., pp. 9655–9666.
Zhang, Q. and Kang, L. (2022), “Locally optimal design for a/b tests in the presence of covariates and network dependence,” Technometrics, 64, 358–369.
Zhang, Q., Khademi, A., and Song, Y. (2022), “Min-max optimal design of two-armed trials with side information,” INFORMS Journal on Computing, 34, 165–182.
Zhao, A. and Ding, P. (2024), “To adjust or not to adjust? estimating the average treatment effect in randomized experiments with missing covariates,” Journal of the American Statistical Association, 119, 450–460.
Zhao, A., Ding, P., and Li, F. (2024), “Covariate adjustment in randomized experiments with missing outcomes and covariates,” Biometrika, 111, 1413–1420.
Zuo, S., Ghosh, D., Ding, P., and Yang, F. (2025), “Mediation analysis with the mediator and outcome missing not at random,” Journal of the American Statistical Association, 120, 794–804.

Acknowledgments

Qiong Zhang’s work is partially supported by the National Science Foundation Award

2413630. Lulu Kang’s work is partially supported by the National Science Foundation Award #2429324. Xinwei Deng’s work is partially supported by the National

Science Foundation Awards #2311187 and #2436319.

Supplementary Materials

Online supplementary materials contain all the technical proofs for the main results

of the paper and supplementary numerical results.

Supplementary materials are available for download.

[1] Bates, D., M¨achler, M., Bolker, B., and Walker, S. (2015), “Fitting Linear MixedEffects Models Using lme4,” Journal of Statistical Software, 67, 1–48.

[2] Chen, H., Ding, P., Geng, Z., and Zhou, X.-H. (2015), “Semiparametric Inference of the Complier Average Causal Effect with Nonignorable Missing Outcomes,” ACM Transactions on Intelligent Systems and Technology (TIST), 7, 1–15.

[3] Chen, X., Kang, X., Jin, R., and Deng, X. (2023), “Bayesian Sparse regression for mixed multi-responses with application to runtime metrics prediction in fog manufacturing,” Technometrics, 65, 206–219.

[4] Deng, A., Xu, Y., Kohavi, R., and Walker, T. (2013), “Improving the sensitivity of online controlled experiments by utilizing pre-experiment data,” in Proceedings of the sixth ACM international conference on Web search and data mining, pp. 123–132.

[5] Ding, P. (2024), A first course in causal inference, CRC Press.

[6] Freedman, D. A. (2008), “On regression adjustments to experimental data,” Advances in Applied Mathematics, 40, 180–193.

[7] Ga lecki, A. and Burzykowski, T. (2012), “Linear mixed-effects model,” in Linear mixed-effects models using R: a step-by-step approach, Springer, pp. 245–273.

[8] Gupta, S., Kohavi, R., Tang, D., Xu, Y., Andersen, R., Bakshy, E., Cardin, N., Chandran, S., Chen, N., Coey, D., et al. (2019), “Top challenges from the first practical online controlled experiments summit,” ACM SIGKDD Explorations Newsletter, 21, 20–35.

[9] Heffernan, N. T. and Heffernan, C. L. (2014), “The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching,” International Journal of Artificial Intelligence in Education, 24, 470–497.

[10] Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, Cambridge University Press.

[11] Jin, Y. and Ba, S. (2023), “Toward optimal variance reduction in online controlled experiments,” Technometrics, 65, 231–242.

[12] Kohavi, R., Tang, D., and Xu, Y. (2020), Trustworthy online controlled experiments: A practical guide to a/b testing, Cambridge University Press.

[13] Larsen, N., Stallrich, J., Sengupta, S., Deng, A., Kohavi, R., and Stevens, N. T.

[14] (2024), “Statistical challenges in online controlled experiments: A review of a/b testing methodology,” The American Statistician, 78, 135–149.

[15] Li, Y., Kang, L., and Huang, X. (2021), “Covariate balancing based on kernel density estimates for controlled experiments,” Statistical Theory and Related Fields, 5, 102–113.

[16] Li, Y., Zhang, Q., Khademi, A., and Yang, B. (2023), “Optimal Design of Controlled Experiments for Personalized Decision Making in the Presence of Observational Covariates,” The New England Journal of Statistics in Data Science, 1, 386–393. Nassi, T. and Jewkes, H.

[17] (2021), “Simultaneous Experimentation: Run Multiple A/B Tests Concurrently,” https://www.split.io/blog/ simultaneous-experiments/.

[18] Neyman, J. (1923), “On the application of probability theory to agricultural experiments. Essay on principles,” Annals of Agricultural Sciences, 1–51.

[19] Poyarkov, A., Drutsa, A., Khalyavin, A., Gusev, G., and Serdyukov, P. (2016), “Boosted decision tree regression adjustment for variance reduction in online controlled experiments,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 235–244.

[20] Rubin, D. B. (1974), “Estimating causal effects of treatments in randomized and nonrandomized studies,” Journal of educational Psychology, 66, 688.

[21] Selent, D., Patikorn, T., and Heffernan, N. (2016), “Assistments dataset from multiple randomized controlled experiments,” in Proceedings of the Third (2016) ACM Conference on Learning@ Scale, pp. 181–184.

[22] Shen, S., Mao, H., Zhang, Z., Chen, Z., Nie, K., and Deng, X. (2023), “ClusteringBased Imputation for Dropout Buyers in Large-Scale Online Experimentation,” The New England Journal of Statistics in Data Science, 1, 415–425.

[23] Syrgkanis, V., Lei, V., Oprescu, M., Hei, M., Battocchi, K., and Lewis, G. (2019), “Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments,” in Advances in Neural Information Processing Systems 32 (NeurIPS 2019),

[24] Vancouver, Canada: Curran Associates, Inc., pp. 9655–9666.

[25] Zhang, Q. and Kang, L. (2022), “Locally optimal design for a/b tests in the presence of covariates and network dependence,” Technometrics, 64, 358–369.

[26] Zhang, Q., Khademi, A., and Song, Y. (2022), “Min-max optimal design of two-armed trials with side information,” INFORMS Journal on Computing, 34, 165–182.

[27] Zhao, A. and Ding, P. (2024), “To adjust or not to adjust? estimating the average treatment effect in randomized experiments with missing covariates,” Journal of the American Statistical Association, 119, 450–460.

[28] Zhao, A., Ding, P., and Li, F. (2024), “Covariate adjustment in randomized experiments with missing outcomes and covariates,” Biometrika, 111, 1413–1420.

[29] Zuo, S., Ghosh, D., Ding, P., and Yang, F. (2025), “Mediation analysis with the mediator and outcome missing not at random,” Journal of the American Statistical Association, 120, 794–804.