Catoni-type Confidence Sequences under Infinite Variance

Guanhua Fang, Sujay Bhatt, Ping Li and Gennady Samorodnitsky

doi:10.5705/ss.202024.0249

Abstract

In this paper, we provide an extension of confidence sequences for set

tings where the variance of the data-generating distribution does not exist or

is infinite. Confidence sequences furnish confidence intervals that are valid at

arbitrary data-dependent stopping times, naturally having a wide range of applications. We first derive the Catoni-style confidence sequences for data distribu-

tions having a bounded pth moment, where p ∈(1, 2), using Ville’s inequality, and

strengthen the existing upper bound results. The derived results are shown to be

better than confidence sequences obtained using vanilla Dubins-Savage inequality. We next establish a lower bound for the width of the Catoni-style confidence

sequences for p ∈(1, 2], and establish the statistical limitation of applying Ville’s

inequality based techniques to Catoni-style confidence sequence estimation. To

close this gap, we further establish the tighter confidence sequences using the

stitching methods. Our new methodology can be easily applied to risk control

and parameter estimation problems.

Key words and phrases: Catoni estimator, Heavy tail, Confidence sequence, Law of iterated logarithm 1

Information

Preprint No.	SS-2024-0249
Manuscript ID	SS-2024-0249
Complete Authors	Guanhua Fang, Sujay Bhatt, Ping Li, Gennady Samorodnitsky
Corresponding Authors	Guanhua Fang
Emails	fanggh@fudan.edu.cn

References

Bhatt, S., G. Fang, P. Li, and G. Samorodnitsky (2022a). Minimax M-estimation under adversarial corruption. In International Conference on Machine Learning, Volume 162, pp.
1906–1924. PMLR.
Bhatt, S., G. Fang, P. Li, and G. Samorodnitsky (2022b). Nearly optimal catoni’s M-estimator for infinite variance. In International Conference on Machine Learning, Volume 162, pp.
1925–1944. PMLR.
Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. In Annales de l’IHP Probabilit´es et statistiques, Volume 48, pp. 1148–1185.
Chen, P., X. Jin, X. Li, and L. Xu (2021). A generalized catoni’s m-estimator under finite α-th moment assumption with α ∈(1, 2). Electronic Journal of Statistics 15(2), 5523–5544.
Darling, D. A. and H. Robbins (1967). Confidence sequences for mean, variance, and median. Proceedings of the National Academy of Sciences of the United States of America 58(1), 66–68.
Dubins, L. E. and L. J. Savage (1965). A tchebycheff-like inequality for stochastic processes. Proceedings of the National Academy of Sciences of the United States of America 53(2), 274–275.
Howard, S. R., A. Ramdas, J. McAuliffe, and J. Sekhon (2020). Time-uniform chernoff bounds via nonnegative supermartingales. Probability Surveys 17, 257–317.
Howard, S. R., A. Ramdas, J. McAuliffe, and J. Sekhon (2021). Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics 49(2), 1055–1080.
Jamieson, K. G. and L. Jain (2018). A bandit approach to sequential experimental design with false discovery control. Advances in Neural Information Processing Systems 31, 3664–3674.
Jennison, C. and B. W. Turnbull (1989). Interim analyses: the repeated confidence interval approach. Journal of the Royal Statistical Society: Series B (Methodological) 51(3), 305– 334.
Johari, R., P. Koomen, L. Pekelis, and D. Walsh (2017). Peeking at a/b tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1517–1525.
Johari, R., L. Pekelis, and D. J. Walsh (2015). Always valid inference: Bringing sequential analysis to a/b testing. arXiv preprint arXiv:1512.04922.
Kallenberg, O. (1975). On the existence and path properties of stochastic integrals. The Annals of Probability 3(2), 262–280.
Khan, R. A. (2009). lp-version of the dubins–savage inequality and some exponential inequalities. Journal of Theoretical Probability 22(2), 348–364.
Malek, A. and S. Chiappa (2021). Asymptotically best causal effect identification with multiarmed bandits. Advances in Neural Information Processing Systems 34, 21960–21971.
Pflug, G. C. (2000). Some remarks on the value-at-risk and the conditional value-at-risk. Probabilistic constrained optimization: Methodology and applications, 272–281.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58(5), 527–535.
Rockafellar, R. T. and S. Uryasev (2002). Conditional value-at-risk for general loss distributions. Journal of banking & finance 26(7), 1443–1471.
Ville, J. (1939). Etude critique de la notion de collectif. Bull. Amer. Math. Soc 45(11), 824–824.
Wang, H. and A. Ramdas (2023). Catoni-style confidence sequences for heavy-tailed mean estimation. Stochastic Processes and Their Applications 163, 168–202.
Waudby-Smith, I. and A. Ramdas (2024). Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27.
Wittmann, R. (1985). A general law of iterated logarithm. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete 68(4), 521–543.
Zhan, R., V. Hadad, D. A. Hirshberg, and S. Athey (2021). Off-policy evaluation via adaptive weighting with data from contextual bandits. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2125–2135.

Acknowledgments

. The authors would like to thank the Associate Editor and the anonymous referee for their constructive suggestions and com-

ments, which helped improve the quality of the paper. The initial version

of this work was conducted while the authors were affiliated with Baidu

USA. Guanhua Fang is partly supported by the National Natural Science

Foundation of China (Grant No. 12301376) and Shanghai Educational Development Foundation (Grant No. 23CGA02). Gennady Samorodnitsky is

partially supported by the U.S. National Science Foundation under grant

DMS-2310974 at Cornell University.

Supplementary Materials

The online material contains technical proofs,

more explanations and discussions.

Supplementary materials are available for download.

[1] Bhatt, S., G. Fang, P. Li, and G. Samorodnitsky (2022a). Minimax M-estimation under adversarial corruption. In International Conference on Machine Learning, Volume 162, pp.

[2] 1906–1924. PMLR.

[3] Bhatt, S., G. Fang, P. Li, and G. Samorodnitsky (2022b). Nearly optimal catoni’s M-estimator for infinite variance. In International Conference on Machine Learning, Volume 162, pp.

[4] 1925–1944. PMLR.

[5] Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. In Annales de l’IHP Probabilit´es et statistiques, Volume 48, pp. 1148–1185.

[6] Chen, P., X. Jin, X. Li, and L. Xu (2021). A generalized catoni’s m-estimator under finite α-th moment assumption with α ∈(1, 2). Electronic Journal of Statistics 15(2), 5523–5544.

[7] Darling, D. A. and H. Robbins (1967). Confidence sequences for mean, variance, and median. Proceedings of the National Academy of Sciences of the United States of America 58(1), 66–68.

[8] Dubins, L. E. and L. J. Savage (1965). A tchebycheff-like inequality for stochastic processes. Proceedings of the National Academy of Sciences of the United States of America 53(2), 274–275.

[9] Howard, S. R., A. Ramdas, J. McAuliffe, and J. Sekhon (2020). Time-uniform chernoff bounds via nonnegative supermartingales. Probability Surveys 17, 257–317.

[10] Howard, S. R., A. Ramdas, J. McAuliffe, and J. Sekhon (2021). Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics 49(2), 1055–1080.

[11] Jamieson, K. G. and L. Jain (2018). A bandit approach to sequential experimental design with false discovery control. Advances in Neural Information Processing Systems 31, 3664–3674.

[12] Jennison, C. and B. W. Turnbull (1989). Interim analyses: the repeated confidence interval approach. Journal of the Royal Statistical Society: Series B (Methodological) 51(3), 305– 334.

[13] Johari, R., P. Koomen, L. Pekelis, and D. Walsh (2017). Peeking at a/b tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1517–1525.

[14] Johari, R., L. Pekelis, and D. J. Walsh (2015). Always valid inference: Bringing sequential analysis to a/b testing. arXiv preprint arXiv:1512.04922.

[15] Kallenberg, O. (1975). On the existence and path properties of stochastic integrals. The Annals of Probability 3(2), 262–280.

[16] Khan, R. A. (2009). lp-version of the dubins–savage inequality and some exponential inequalities. Journal of Theoretical Probability 22(2), 348–364.

[17] Malek, A. and S. Chiappa (2021). Asymptotically best causal effect identification with multiarmed bandits. Advances in Neural Information Processing Systems 34, 21960–21971.

[18] Pflug, G. C. (2000). Some remarks on the value-at-risk and the conditional value-at-risk. Probabilistic constrained optimization: Methodology and applications, 272–281.

[19] Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58(5), 527–535.

[20] Rockafellar, R. T. and S. Uryasev (2002). Conditional value-at-risk for general loss distributions. Journal of banking & finance 26(7), 1443–1471.

[21] Ville, J. (1939). Etude critique de la notion de collectif. Bull. Amer. Math. Soc 45(11), 824–824.

[22] Wang, H. and A. Ramdas (2023). Catoni-style confidence sequences for heavy-tailed mean estimation. Stochastic Processes and Their Applications 163, 168–202.

[23] Waudby-Smith, I. and A. Ramdas (2024). Estimating means of bounded random variables by betting. Journal of the Royal Statistical Society Series B: Statistical Methodology 86(1), 1–27.

[24] Wittmann, R. (1985). A general law of iterated logarithm. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete 68(4), 521–543.

[25] Zhan, R., V. Hadad, D. A. Hirshberg, and S. Athey (2021). Off-policy evaluation via adaptive weighting with data from contextual bandits. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2125–2135.