Robust Control Experiments for Multivariate Tests with Covariates and Network Information

Shaohua Xu and Yongdao Zhou

doi:10.5705/ss.202025.0157

Abstract

Multivariate testing has recently emerged as a promising technique

in scientific decision-making and electronic information fields. Unlike standard

A/B/n testing, which evaluates individual variations, multivariate testing aims

to identify the best-performing combination of variations from all possible combinations. We address the challenge of robustly allocating treatments to subjects

in multivariate testing when treatment effects are confounded by covariates and

subjects are interconnected through a network. In this context, we introduce, for

the first time, the use of a mixed effect model to account for covariate uncertainty

and network structure. Based on this model, we propose a criterion to measure

the regret of efficiency due to incorrect specification of the covariance structure.

We derive minimax robust experimental schemes and introduce a novel scheme

that optimally matches the design with the robust covariance structure.

Our

proposed experimental schemes demonstrate: (a) resilience to various optimality criteria, (b) efficiency against model misspecification, and (c) applicability

to complex scenarios. This work extends existing research in optimal A/B testing designs, offering theoretical foundations and practical implementations that

outperform current approaches in statistical efficiency, as demonstrated through

simulations and a case study.

Key words and phrases: A/B testing, minimax risk, mixed effect model

Information

Preprint No.	SS-2025-0157
Manuscript ID	SS-2025-0157
Complete Authors	Shaohua Xu, Yongdao Zhou
Corresponding Authors	Yongdao Zhou
Emails	ydzhou@nankai.edu.cn

References

Asuncion, A. and D. Newman (2007). UCI Machine Learning Repository. https://archive. ics.uci.edu.
Atkinson, A., A. Donev, and R. Tobias (2007). Optimum Experimental Designs, with SAS, Volume 34. Oxford University Press, Oxford.
Bai, Y., J. Liu, and M. Tabord-Meehan (2024). Inference for matched tuples and fully blocked factorial designs. Quantitative Economics 15(2), 279–330.
Bhat, N., V. F. Farias, C. C. Moallemi, and D. Sinha (2020). Near-optimal A-B testing. Management Science 66(10), 4477–4495.
Branson, Z., T. Dasgupta, and D. B. Rubin (2016). Improving covariate balance in 2K factorial designs via rerandomization with an application to a New York City Department of Education High School Study. The Annals of Applied Statistics 10(4), 1958–1976.
Chen, Q., B. Li, L. Deng, and Y. Wang (2023). Optimized covariance design for ab test on social network under interference. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 37448–37471.
Haizler, T. and D. M. Steinberg (2021). Factorial designs for online experiments. Technometrics 63(1), 1–12.
Harville, D. (1976). Extension of the gauss-markov theorem to include the estimation of random effects. The Annals of Statistics 4(2), 384–395.
Jiang, J., D. Legrand, R. Severn, and R. Miikkulainen (2020). A comparison of the taguchi method and evolutionary optimization in multivariate testing. In 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–6. IEEE.
Kohavi, R., R. Longbotham, D. Sommerfield, and R. M. Henne (2009). Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery 18, 140– 181.
Kohavi, R., D. Tang, and Y. Xu (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press, Cambridge.
Larsen, N., J. Stallrich, S. Sengupta, A. Deng, R. Kohavi, and N. T. Stevens (2024). Statistical challenges in online controlled experiments: a review of A/B testing methodology. The American Statistician 78(2), 135–149.
Li, X. and P. Ding (2020). Rerandomization and regression adjustment. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(1), 241–268.
Liu, H., J. Ren, and Y. Yang (2024). Randomization-based joint central limit theorem and efficient covariate adjustment in randomized block 2k factorial experiments. Journal of the American Statistical Association 119(545), 136–150.
Nesterov, Y. (1998). Semidefinite relaxation and nonconvex quadratic optimization. Optimization Methods and Software 9(1-3), 141–160.
Nesterov, Y. (2007). Smoothing technique and its applications in semidefinite optimization. Mathematical Programming 110(2), 245–259.
Pashley, N. E. and M.-A. C. Bind (2023). Causal inference for multiple treatments using fractional factorial designs. Canadian Journal of Statistics 51(2), 444–468.
Pokhilko, V., Q. Zhang, L. Kang, et al. (2019). D-optimal design for network A/B testing. Journal of Statistical Theory and Practice 13(4), 1–23.
Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association 100(469), 322–331.
Sadeghi, S., P. Chien, and N. Arora (2020). Sliced designs for multi-platform online experiments. Technometrics 62(3), 387–402.
Searle, S. R., G. Casella, and C. E. McCulloch (2009). Variance Components. John Wiley &
Sons, Hoboken, NJ.
Verbeke, G., G. Molenberghs, and G. Verbeke (1997). Linear Mixed Models for Longitudinal Data. Springer, New York.
Vono, M., N. Dobigeon, and P. Chainais (2022). High-dimensional gaussian sampling: a review and a unifying approach based on a stochastic proximal point algorithm. SIAM Review 64(1), 3–56.
Waldspurger, I., A. d’Aspremont, and S. Mallat (2015). Phase recovery, maxcut and complex semidefinite programming. Mathematical Programming 149, 47–81.
Wang, T., C. Rudin, F. Doshi-Velez, Y. Liu, E. Klampfl, and P. MacNeille (2017). A bayesian framework for learning rule sets for interpretable classification. The Journal of Machine Learning Research 18(1), 2357–2393.
Wiens, D. P. (2015). Robustness of Design. In A. M. Dean, M. Morris, J. Stufken, and D. Bingham (Eds.), Handbook of Design and Analysis of Experiments, pp. 719–753. CRC Press Taylor & Francis Group, Boca Raton, FL.
Zhang, Q. and L. Kang (2022). Locally optimal design for A/B tests in the presence of covariates and network dependence. Technometrics 64(3), 358–369.
Zhao, A. and P. Ding (2022). Regression-based causal inference with factorial experiments: estimands, model specifications and design-based properties. Biometrika 109(3), 799–815.
Zhao, A. and P. Ding (2023). Covariate adjustment in multiarmed, possibly factorial experiments. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(1), 1–23.
NITFID, School of Statistics and Data Science, Nankai University, Tianjin 300071, China

Acknowledgments

The authors would like to thank the Editor, Associate Editor, and two

reviewers for their valuable comments and suggestions. This work was supported by the National Natural Science Foundation of China (12131001),

the Fundamental Research Funds for Central Universities, LPMC, and

KLMDASR.

Supplementary Materials

The Supplementary Material includes two applications of the proposed robust experimental schemes: A/B testing and sequential experiments, sup-

plementary simulation results, and proofs for all the theoretical results.

Supplementary materials are available for download.

[1] Asuncion, A. and D. Newman (2007). UCI Machine Learning Repository. https://archive. ics.uci.edu.

[2] Atkinson, A., A. Donev, and R. Tobias (2007). Optimum Experimental Designs, with SAS, Volume 34. Oxford University Press, Oxford.

[3] Bai, Y., J. Liu, and M. Tabord-Meehan (2024). Inference for matched tuples and fully blocked factorial designs. Quantitative Economics 15(2), 279–330.

[4] Bhat, N., V. F. Farias, C. C. Moallemi, and D. Sinha (2020). Near-optimal A-B testing. Management Science 66(10), 4477–4495.

[5] Branson, Z., T. Dasgupta, and D. B. Rubin (2016). Improving covariate balance in 2K factorial designs via rerandomization with an application to a New York City Department of Education High School Study. The Annals of Applied Statistics 10(4), 1958–1976.

[6] Chen, Q., B. Li, L. Deng, and Y. Wang (2023). Optimized covariance design for ab test on social network under interference. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 37448–37471.

[7] Haizler, T. and D. M. Steinberg (2021). Factorial designs for online experiments. Technometrics 63(1), 1–12.

[8] Harville, D. (1976). Extension of the gauss-markov theorem to include the estimation of random effects. The Annals of Statistics 4(2), 384–395.

[9] Jiang, J., D. Legrand, R. Severn, and R. Miikkulainen (2020). A comparison of the taguchi method and evolutionary optimization in multivariate testing. In 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–6. IEEE.

[10] Kohavi, R., R. Longbotham, D. Sommerfield, and R. M. Henne (2009). Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery 18, 140– 181.

[11] Kohavi, R., D. Tang, and Y. Xu (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press, Cambridge.

[12] Larsen, N., J. Stallrich, S. Sengupta, A. Deng, R. Kohavi, and N. T. Stevens (2024). Statistical challenges in online controlled experiments: a review of A/B testing methodology. The American Statistician 78(2), 135–149.

[13] Li, X. and P. Ding (2020). Rerandomization and regression adjustment. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(1), 241–268.

[14] Liu, H., J. Ren, and Y. Yang (2024). Randomization-based joint central limit theorem and efficient covariate adjustment in randomized block 2k factorial experiments. Journal of the American Statistical Association 119(545), 136–150.

[15] Nesterov, Y. (1998). Semidefinite relaxation and nonconvex quadratic optimization. Optimization Methods and Software 9(1-3), 141–160.

[16] Nesterov, Y. (2007). Smoothing technique and its applications in semidefinite optimization. Mathematical Programming 110(2), 245–259.

[17] Pashley, N. E. and M.-A. C. Bind (2023). Causal inference for multiple treatments using fractional factorial designs. Canadian Journal of Statistics 51(2), 444–468.

[18] Pokhilko, V., Q. Zhang, L. Kang, et al. (2019). D-optimal design for network A/B testing. Journal of Statistical Theory and Practice 13(4), 1–23.

[19] Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association 100(469), 322–331.

[20] Sadeghi, S., P. Chien, and N. Arora (2020). Sliced designs for multi-platform online experiments. Technometrics 62(3), 387–402.

[21] Searle, S. R., G. Casella, and C. E. McCulloch (2009). Variance Components. John Wiley &

[22] Sons, Hoboken, NJ.

[23] Verbeke, G., G. Molenberghs, and G. Verbeke (1997). Linear Mixed Models for Longitudinal Data. Springer, New York.

[24] Vono, M., N. Dobigeon, and P. Chainais (2022). High-dimensional gaussian sampling: a review and a unifying approach based on a stochastic proximal point algorithm. SIAM Review 64(1), 3–56.

[25] Waldspurger, I., A. d’Aspremont, and S. Mallat (2015). Phase recovery, maxcut and complex semidefinite programming. Mathematical Programming 149, 47–81.

[26] Wang, T., C. Rudin, F. Doshi-Velez, Y. Liu, E. Klampfl, and P. MacNeille (2017). A bayesian framework for learning rule sets for interpretable classification. The Journal of Machine Learning Research 18(1), 2357–2393.

[27] Wiens, D. P. (2015). Robustness of Design. In A. M. Dean, M. Morris, J. Stufken, and D. Bingham (Eds.), Handbook of Design and Analysis of Experiments, pp. 719–753. CRC Press Taylor & Francis Group, Boca Raton, FL.

[28] Zhang, Q. and L. Kang (2022). Locally optimal design for A/B tests in the presence of covariates and network dependence. Technometrics 64(3), 358–369.

[29] Zhao, A. and P. Ding (2022). Regression-based causal inference with factorial experiments: estimands, model specifications and design-based properties. Biometrika 109(3), 799–815.

[30] Zhao, A. and P. Ding (2023). Covariate adjustment in multiarmed, possibly factorial experiments. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(1), 1–23.

[31] NITFID, School of Statistics and Data Science, Nankai University, Tianjin 300071, China