Empirical Bayes Estimation with Side Information: A Nonparametric Integrative Tweedie Approach

Jiajun Luo, Trambak Banerjee, Gourab Mukherjee and Wenguang Sun

doi:10.5705/ss.202024.0063

Abstract

We investigate the problem of compound estimation of normal means while accounting for the presence

of side information. Leveraging the empirical Bayes framework, we develop a nonparametric integrative

Tweedie (NIT) approach that incorporates structural knowledge encoded in multivariate auxiliary data

to enhance the precision of compound estimation. Our approach employs convex optimization tools to

estimate the gradient of the log-density directly, enabling the incorporation of structural constraints.

We conduct theoretical analyses of the asymptotic risk of NIT and establish the rate at which NIT

converges to the oracle estimator. As the dimension of the auxiliary data increases, we accurately

quantify the improvements in estimation risk and the associated deterioration in convergence rate.

The numerical performance of NIT is illustrated through the analysis of both simulated and real data,

demonstrating its superiority over existing methods.

Key words and phrases: Compound Decision Problem, Convex Optimization, Kernelized Stein’s Dis- crepancy, Side Information, Tweedie’s Formula

Information

Preprint No.	SS-2024-0063
Manuscript ID	SS-2024-0063
Complete Authors	Jiajun Luo, Trambak Banerjee, Gourab Mukherjee, Wenguang Sun
Corresponding Authors	Trambak Banerjee
Emails	trambak@ku.edu

References

Banerjee, T., L. J. Fu, G. M. James, G. Mukherjee, and W. Sun (2024). Nonparametric empirical bayes estimation on heterogeneous data. arXiv preprint arXiv:2002.12586.
Banerjee, T., Q. Liu, G. Mukherjee, and W. Sun (2021). A general framework for empirical bayes estimation in discrete linear exponential family. Journal of Machine Learning Research 22(67), 1–46.
Banerjee, T., G. Mukherjee, and D. Paul (2021). Improved shrinkage prediction under a spiked covariance structure. Journal of machine learning research 22(180), 1–40.
Banerjee, T., G. Mukherjee, and W. Sun (2020). Adaptive sparse estimation with side information. Journal of the American Statistical Association 115, 2053–2067.
Banerjee, T. and P. Sharma (2025). Nonparametric empirical bayes prediction in mixed models. Statistics and Computing 35(5), 145.
Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B. Methodological 57, 289–300.
Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. The Annals of Mathematical Statistics 42(3), 855–903.
Brown, L. D. and E. Greenshtein (2009). Nonparametric empirical bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. The Annals of Statistics 37, 1685–1704.
Brown, L. D., E. Greenshtein, and Y. Ritov (2013). The poisson compound decision problem revisited. Journal of the American Statistical Association 108(502), 741–749.
Cai, T. T., W. Sun, and W. Wang (2019). CARS: Covariate assisted ranking and screening for large-scale two-sample inference (with discussion). Journal of the Royal Statistical Society Series B: Statistical Methodology 81, 187–234.
Chwialkowski, K., H. Strathmann, and A. Gretton (2016). A kernel test of goodness of fit. In Proceedings of The 33rd International Conference on Machine Learning, Volume 48 of Proceedings of Machine Learning Research, New York, New York, USA, pp. 2606–2615. PMLR.
Cohen, N., E. Greenshtein, and Y. Ritov (2013). Empirical bayes in the presence of explanatory variables. Statistica Sinica 23(1), 333–357.
Dou, Z., S. Kotekal, Z. Xu, and H. H. Zhou (2024). From optimal score matching to optimal sampling. arXiv preprint arXiv:2409.07032.
Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association 106(496), 1602–1614.
Efron, B. (2016). Empirical bayes deconvolution estimates. Biometrika 103(1), 1–20.
Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96, 1151–1160.
Gretton, A., K. M. Borgwardt, M. J. Rasch, B. Sch¨olkopf, and A. Smola (2012). A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773.
Gu, J. and R. Koenker (2017). Unobserved heterogeneity in income dynamics: An empirical bayes perspective.
Gu, J. and R. Koenker (2023). Invidious comparisons: Ranking and selection as compound decisions. Econometrica 91(1), 1–41.
Ignatiadis, N. and W. Huber (2021). Covariate powered cross-weighted multiple testing. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(4), 720–751.
Ignatiadis, N., S. Saha, D. L. Sun, and O. Muralidharan (2023). Empirical bayes mean estimation with nonparametric errors via order statistic regression. Journal of the American Statistical Association 118(542), 987–999.
Ignatiadis, N. and S. Wager (2019). Covariate-powered empirical bayes estimation. In Advances in Neural Information Processing Systems, pp. 9617–9629. Curran Associates, Inc.
Jana, S., Y. Polyanskiy, A. Z. Teh, and Y. Wu (2023). Empirical bayes via erm and rademacher complexities: the poisson model. In The Thirty Sixth Annual Conference on Learning Theory, pp. 5199–5235. PMLR.
Jiang, W. and C.-H. Zhang (2009). General maximum likelihood empirical bayes estimation of normal means. The Annals of Statistics 37(4), 1647–1684.
Jiang, W. and C.-H. Zhang (2010). Empirical bayes in-season prediction of baseball batting averages. In Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, Volume 6, pp. 263–274. Institute of Mathematical Statistics.
Jitkrittum, W., H. Kanagawa, and B. Sch¨olkopf (2020). Testing goodness of fit of conditional density models with kernels. In Conference on Uncertainty in Artificial Intelligence, pp. 221–230. PMLR.
Ke, T., J. Jin, and J. Fan (2014). Covariance assisted screening and estimation. Annals of statistics 42(6), 2202–2242.
Kiefer, J. and J. Wolfowitz (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Annals of Mathematical Statistics 27(1), 887–906.
Kim, Y., W. Wang, P. Carbonetto, and M. Stephens (2022). A flexible empirical bayes approach to multiple linear regression and connections with penalized regression. arXiv preprint arXiv:2208.10910.
Koenker, R. and J. Gu (2017a). REBayes: An R package for empirical bayes mixture methods. Journal of Statistical Software 82(8), 1–26.
Koenker, R. and J. Gu (2017b). Rebayes: Empirical bayes mixture methods in r. Journal of Statistical Software 82(8), 1–26.
Koenker, R. and I. Mizera (2014). Convex optimization, shape constraints, compound decisions, and empirical bayes rules. Journal of the American Statistical Association 109(506), 674–685.
Kou, S. and J. J. Yang (2017). Optimal shrinkage estimation in heteroscedastic hierarchical linear models. In Big
Krusi´nska, E. (1987). A valuation of state of object based on weighted mahalanobis distance. Pattern Recognition 20(4), 413–418.
Lei, L. and W. Fithian (2018). Adapt: an interactive procedure for multiple testing with side information. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(4), 649–679.
Li, A. and R. F. Barber (2019). Multiple testing with the structure-adaptive benjamini–hochberg algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81(1), 45–74.
Liu, Q., J. Lee, and M. Jordan (2016). A kernelized stein discrepancy for goodness-of-fit tests. In International conference on machine learning, pp. 276–284. PMLR.
Liu, Q. and D. Wang (2016). Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in Neural Information Processing Systems, Volume 29, pp. 2378–2386. Curran Associates, Inc.
Oates, C. J., M. Girolami, and N. Chopin (2017). Control functionals for monte carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 695–718.
Polyanskiy, Y. and Y. Wu (2020). Self-regularizing property of nonparametric maximum likelihood estimator in mixture models. arXiv preprint arXiv:2008.08244.
Ren, Z. and E. Cand`es (2020). Knockoffs with side information. arXiv preprint arXiv:2001.07835.
Robbins, H. (1951). Asymptotically subminimax solutions of compound statistical decision problems. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, Berkeley and Los Angeles, pp. 131–148. University of California Press.
Robbins, H. (1964). The empirical bayes approach to statistical decision problems. The Annals of Mathematical Statistics 35(1), 1–20.
Roeder, K. and L. Wasserman (2009). Genome-wide significance levels and weighted hypothesis testing. Statistical science: a review journal of the Institute of Mathematical Statistics 24(4), 398.
Saha, S. and A. Guntuboyina (2020). On the nonparametric maximum likelihood estimator for gaussian location mixture densities with application to gaussian denoising. The Annals of Statistics 48(2), 738–762.
Sen, N., P. Sung, A. Panda, and A. M. Arvin (2018). Distinctive roles for type i and type ii interferons and interferon regulatory factors in the host cell defense against varicella-zoster virus. Journal of virology 92(21), e01151–18.
Serfling, R. (2009). Approximation Theorems of Mathematical Statistics. Wiley.
Shen, Y. and Y. Wu (2022). Empirical bayes estimation: When does g-modeling beat f-modeling in theory (and in practice)? arXiv preprint arXiv:2211.12692. maximum likelihood. arXiv preprint arXiv:2109.03466.
Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Technical report, STANFORD UNIVERSITY STANFORD United States.
Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association 102, 901–912.
Wand, M. and M. Jones (1995). Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC.
Weinstein, A., Z. Ma, L. D. Brown, and C.-H. Zhang (2018). Group-linear empirical bayes estimates for a heteroscedastic normal mean. Journal of the American Statistical Association 113(522), 698–710.
Wibisono, A., Y. Wu, and K. Y. Yang (2024). Optimal score estimation via empirical bayes smoothing. arXiv preprint arXiv:2402.07747.
Xie, X., S. Kou, and L. D. Brown (2012). Sure estimates for a heteroscedastic hierarchical model. Journal of the American Statistical Association 107(500), 1465–1479.
Yang, J., Q. Liu, V. Rao, and J. Neville (2018). Goodness-of-fit testing for discrete distributions via stein discrepancy. In International Conference on Machine Learning, pp. 5561–5570. PMLR.
Zerboni, L., N. Sen, S. L. Oliver, and A. M. Arvin (2014). Molecular mechanisms of varicella zoster virus pathogenesis. Nature reviews microbiology 12(3), 197–210.
Zhang, C.-H. (1997). Empirical bayes and compound estimation of normal means. Statistica Sinica 7(1), 181–193.
Zhang, K., C. H. Yin, F. Liang, and J. Liu (2024). Minimax optimality of score-based diffusion models: Beyond the density lower bound assumptions. arXiv preprint arXiv:2402.15602.
Zhang, Y., Y. Cui, B. Sen, and K.-C. Toh (2022). On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models. arXiv preprint arXiv:2208.07514. Jiajun Luo - University of Southern California

Acknowledgments

The authors thank the Editor, Associate Editor, and three anonymous referees for their

thoughtful and constructive feedback, which has substantially improved the quality and

presentation of this article.

Supplementary Materials

The online Supplement provides the proofs of all results stated in the main paper, an additional numerical experiment, further details regarding the real data example of Section 5

and an additional real data example.

Supplementary materials are available for download.

[1] Banerjee, T., L. J. Fu, G. M. James, G. Mukherjee, and W. Sun (2024). Nonparametric empirical bayes estimation on heterogeneous data. arXiv preprint arXiv:2002.12586.

[2] Banerjee, T., Q. Liu, G. Mukherjee, and W. Sun (2021). A general framework for empirical bayes estimation in discrete linear exponential family. Journal of Machine Learning Research 22(67), 1–46.

[3] Banerjee, T., G. Mukherjee, and D. Paul (2021). Improved shrinkage prediction under a spiked covariance structure. Journal of machine learning research 22(180), 1–40.

[4] Banerjee, T., G. Mukherjee, and W. Sun (2020). Adaptive sparse estimation with side information. Journal of the American Statistical Association 115, 2053–2067.

[5] Banerjee, T. and P. Sharma (2025). Nonparametric empirical bayes prediction in mixed models. Statistics and Computing 35(5), 145.

[6] Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B. Methodological 57, 289–300.

[7] Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. The Annals of Mathematical Statistics 42(3), 855–903.

[8] Brown, L. D. and E. Greenshtein (2009). Nonparametric empirical bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. The Annals of Statistics 37, 1685–1704.

[9] Brown, L. D., E. Greenshtein, and Y. Ritov (2013). The poisson compound decision problem revisited. Journal of the American Statistical Association 108(502), 741–749.

[10] Cai, T. T., W. Sun, and W. Wang (2019). CARS: Covariate assisted ranking and screening for large-scale two-sample inference (with discussion). Journal of the Royal Statistical Society Series B: Statistical Methodology 81, 187–234.

[11] Chwialkowski, K., H. Strathmann, and A. Gretton (2016). A kernel test of goodness of fit. In Proceedings of The 33rd International Conference on Machine Learning, Volume 48 of Proceedings of Machine Learning Research, New York, New York, USA, pp. 2606–2615. PMLR.

[12] Cohen, N., E. Greenshtein, and Y. Ritov (2013). Empirical bayes in the presence of explanatory variables. Statistica Sinica 23(1), 333–357.

[13] Dou, Z., S. Kotekal, Z. Xu, and H. H. Zhou (2024). From optimal score matching to optimal sampling. arXiv preprint arXiv:2409.07032.

[14] Efron, B. (2011). Tweedie’s formula and selection bias. Journal of the American Statistical Association 106(496), 1602–1614.

[15] Efron, B. (2016). Empirical bayes deconvolution estimates. Biometrika 103(1), 1–20.

[16] Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96, 1151–1160.

[17] Gretton, A., K. M. Borgwardt, M. J. Rasch, B. Sch¨olkopf, and A. Smola (2012). A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773.

[18] Gu, J. and R. Koenker (2017). Unobserved heterogeneity in income dynamics: An empirical bayes perspective.

[19] Gu, J. and R. Koenker (2023). Invidious comparisons: Ranking and selection as compound decisions. Econometrica 91(1), 1–41.

[20] Ignatiadis, N. and W. Huber (2021). Covariate powered cross-weighted multiple testing. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(4), 720–751.

[21] Ignatiadis, N., S. Saha, D. L. Sun, and O. Muralidharan (2023). Empirical bayes mean estimation with nonparametric errors via order statistic regression. Journal of the American Statistical Association 118(542), 987–999.

[22] Ignatiadis, N. and S. Wager (2019). Covariate-powered empirical bayes estimation. In Advances in Neural Information Processing Systems, pp. 9617–9629. Curran Associates, Inc.

[23] Jana, S., Y. Polyanskiy, A. Z. Teh, and Y. Wu (2023). Empirical bayes via erm and rademacher complexities: the poisson model. In The Thirty Sixth Annual Conference on Learning Theory, pp. 5199–5235. PMLR.

[24] Jiang, W. and C.-H. Zhang (2009). General maximum likelihood empirical bayes estimation of normal means. The Annals of Statistics 37(4), 1647–1684.

[25] Jiang, W. and C.-H. Zhang (2010). Empirical bayes in-season prediction of baseball batting averages. In Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, Volume 6, pp. 263–274. Institute of Mathematical Statistics.

[26] Jitkrittum, W., H. Kanagawa, and B. Sch¨olkopf (2020). Testing goodness of fit of conditional density models with kernels. In Conference on Uncertainty in Artificial Intelligence, pp. 221–230. PMLR.

[27] Ke, T., J. Jin, and J. Fan (2014). Covariance assisted screening and estimation. Annals of statistics 42(6), 2202–2242.

[28] Kiefer, J. and J. Wolfowitz (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Annals of Mathematical Statistics 27(1), 887–906.

[29] Kim, Y., W. Wang, P. Carbonetto, and M. Stephens (2022). A flexible empirical bayes approach to multiple linear regression and connections with penalized regression. arXiv preprint arXiv:2208.10910.

[30] Koenker, R. and J. Gu (2017a). REBayes: An R package for empirical bayes mixture methods. Journal of Statistical Software 82(8), 1–26.

[31] Koenker, R. and J. Gu (2017b). Rebayes: Empirical bayes mixture methods in r. Journal of Statistical Software 82(8), 1–26.

[32] Koenker, R. and I. Mizera (2014). Convex optimization, shape constraints, compound decisions, and empirical bayes rules. Journal of the American Statistical Association 109(506), 674–685.

[33] Kou, S. and J. J. Yang (2017). Optimal shrinkage estimation in heteroscedastic hierarchical linear models. In Big

[34] Krusi´nska, E. (1987). A valuation of state of object based on weighted mahalanobis distance. Pattern Recognition 20(4), 413–418.

[35] Lei, L. and W. Fithian (2018). Adapt: an interactive procedure for multiple testing with side information. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(4), 649–679.

[36] Li, A. and R. F. Barber (2019). Multiple testing with the structure-adaptive benjamini–hochberg algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81(1), 45–74.

[37] Liu, Q., J. Lee, and M. Jordan (2016). A kernelized stein discrepancy for goodness-of-fit tests. In International conference on machine learning, pp. 276–284. PMLR.

[38] Liu, Q. and D. Wang (2016). Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in Neural Information Processing Systems, Volume 29, pp. 2378–2386. Curran Associates, Inc.

[39] Oates, C. J., M. Girolami, and N. Chopin (2017). Control functionals for monte carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 695–718.

[40] Polyanskiy, Y. and Y. Wu (2020). Self-regularizing property of nonparametric maximum likelihood estimator in mixture models. arXiv preprint arXiv:2008.08244.

[41] Ren, Z. and E. Cand`es (2020). Knockoffs with side information. arXiv preprint arXiv:2001.07835.

[42] Robbins, H. (1951). Asymptotically subminimax solutions of compound statistical decision problems. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, Berkeley and Los Angeles, pp. 131–148. University of California Press.

[43] Robbins, H. (1964). The empirical bayes approach to statistical decision problems. The Annals of Mathematical Statistics 35(1), 1–20.

[44] Roeder, K. and L. Wasserman (2009). Genome-wide significance levels and weighted hypothesis testing. Statistical science: a review journal of the Institute of Mathematical Statistics 24(4), 398.

[45] Saha, S. and A. Guntuboyina (2020). On the nonparametric maximum likelihood estimator for gaussian location mixture densities with application to gaussian denoising. The Annals of Statistics 48(2), 738–762.

[46] Sen, N., P. Sung, A. Panda, and A. M. Arvin (2018). Distinctive roles for type i and type ii interferons and interferon regulatory factors in the host cell defense against varicella-zoster virus. Journal of virology 92(21), e01151–18.

[47] Serfling, R. (2009). Approximation Theorems of Mathematical Statistics. Wiley.

[48] Shen, Y. and Y. Wu (2022). Empirical bayes estimation: When does g-modeling beat f-modeling in theory (and in practice)? arXiv preprint arXiv:2211.12692. maximum likelihood. arXiv preprint arXiv:2109.03466.

[49] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Technical report, STANFORD UNIVERSITY STANFORD United States.

[50] Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association 102, 901–912.

[51] Wand, M. and M. Jones (1995). Kernel Smoothing. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC.

[52] Weinstein, A., Z. Ma, L. D. Brown, and C.-H. Zhang (2018). Group-linear empirical bayes estimates for a heteroscedastic normal mean. Journal of the American Statistical Association 113(522), 698–710.

[53] Wibisono, A., Y. Wu, and K. Y. Yang (2024). Optimal score estimation via empirical bayes smoothing. arXiv preprint arXiv:2402.07747.

[54] Xie, X., S. Kou, and L. D. Brown (2012). Sure estimates for a heteroscedastic hierarchical model. Journal of the American Statistical Association 107(500), 1465–1479.

[55] Yang, J., Q. Liu, V. Rao, and J. Neville (2018). Goodness-of-fit testing for discrete distributions via stein discrepancy. In International Conference on Machine Learning, pp. 5561–5570. PMLR.

[56] Zerboni, L., N. Sen, S. L. Oliver, and A. M. Arvin (2014). Molecular mechanisms of varicella zoster virus pathogenesis. Nature reviews microbiology 12(3), 197–210.

[57] Zhang, C.-H. (1997). Empirical bayes and compound estimation of normal means. Statistica Sinica 7(1), 181–193.

[58] Zhang, K., C. H. Yin, F. Liang, and J. Liu (2024). Minimax optimality of score-based diffusion models: Beyond the density lower bound assumptions. arXiv preprint arXiv:2402.15602.

[59] Zhang, Y., Y. Cui, B. Sen, and K.-C. Toh (2022). On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models. arXiv preprint arXiv:2208.07514. Jiajun Luo - University of Southern California