Nonparametric Shrinkage Estimation in High Dimensional GLMs via Polya Trees

Asaf Weinstein, Jonas Wallin, Daniel Yekutieli and Malgorzata Bogdan

doi:10.5705/ss.202025.0221

Abstract

Regularization in fitting regression models has been a very active topic of research in the

past few decades, but most of the existing methods are designed for particular situations, e.g. for

the case of a sparse coefficient vector.

We consider the problem of designing universally optimal

regularized estimators in a given generalized linear model with fixed effects. First, we propose as a

contender the Bayes estimator against an ideal prior that assigns equal mass to every permutation of

the fixed coefficient vector, thus depending on the true coefficients only through their empirical CDF.

We prove some optimality properties of this oracle estimator in both the frequentist and Bayesian

frameworks. To compete with the oracle estimator, we posit a hierarchical Bayes model where the

individual coefficients are modeled as i.i.d. draws from a common distribution π, which is in turn

assigned a Polya tree prior that reflects indefiniteness. We demonstrate in examples that the posterior

mean of π under the postulated model adapts nonparametrically to the empirical CDF of the true

coefficients. Correspondingly, the posterior means of the coefficients themselves are used to mimic the

accuracy compared to various parametric and nonparametric alternatives, from relatively standard Lpregularized estimators to modern penalized-likelihood and Bayesian estimators for high dimensional

regression.

Key words and phrases: Hierarchical modeling; empirical Bayes methods; nonparametric inference

Information

Preprint No.	SS-2025-0221
Manuscript ID	SS-2025-0221
Complete Authors	Asaf Weinstein, Jonas Wallin, Daniel Yekutieli, Malgorzata Bogdan
Corresponding Authors	Asaf Weinstein
Emails	asaf.weinstein@mail.huji.ac.il

References

Aguilar, J. E. and P.-C. B¨urkner (2023). Intuitive joint priors for bayesian linear multilevel models: The r2d2m2 prior. Electronic Journal of Statistics 17(1), 1711–1767.
Andersen, M. R., A. Vehtari, O. Winther, and L. K. Hansen (2017). Bayesian inference for spatio-temporal spikeand-slab priors. Journal of Machine Learning Research 18(139), 1–58.
Antoniak, C. E. (1974). Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The annals of statistics, 1152–1174.
Berger, J. O. (2013). Statistical decision theory and Bayesian analysis. Springer Science & Business Media.
Berry, D. A. and R. Christensen (1979). Empirical bayes estimation of a binomial parameter via mixtures of dirichlet processes. The Annals of Statistics 7(3), 558–568.
Bogdan, M., F. Frommlet, P. Biecek, R. Cheng, J. K. Ghosh, and R. W. Doerge (2008). Extending the modified bayesian information criterion (mbic) to dense markers and multiple interval mapping. Biometrics 64(4), 1162– 1169.
Bogdan, M., E. Van Den Berg, C. Sabatti, W. Su, and E. J. Cand`es (2015). Slope—adaptive variable selection via convex optimization. The annals of applied statistics 9(3), 1103.
Bondell, H. and B. Reich (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics 64(1), 115–123.
Brown, L. D. and E. Greenshtein (2009). Nonparametric empirical bayes and compound decision approaches to
B¨urkner, P.-C. (2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software 100(5), 1–54.
Carbonetto, P., X. Zhou, and M. Stephens (2017). varbvs: Fast variable selection for large-scale regression. arXiv preprint arXiv:1709.06597.
Carlin, B. P. and T. A. Louis (1997). Bayes and empirical bayes methods for data analysis.
Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480.
Castillo, I. (2017). P´olya tree posterior distributions on densities. In Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, Volume 53, pp. 2074–2102. Institut Henri Poincar´e.
Diaconis, P. and D. Freedman (1980). Finite exchangeable sequences. The Annals of Probability, 745–764.
Efron, B. and C. Morris (1973). Stein’s estimation rule and its competitors—an empirical bayes approach. Journal of the American Statistical Association 68(341), 117–130.
Endelman, J. B. (2011). Ridge regression and other kernels for genomic selection with r package rrblup. The Plant Genome 4, 250–255.
Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.
Ferguson, T. S. (1973). A bayesian analysis of some nonparametric problems. The annals of statistics, 209–230.
Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. The annals of statistics 2(4), 615–629.
Fraser, H. B., T. Babak, J. Tsang, Y. Zhou, B. Zhang, M. Mehrabian, and E. E. Schadt (2011). Systematic detection of polygenic cis-regulatory evolution. PLOS Genetics 7(3), e1002023.
Fraser, H. B., A. Moses, and E. E. Schadt (2010). Evidence for widespread adaptive evolution of gene expression in budding yeast. Proceedings of the National Academy of Sciences of the United States of America 107, 2977–2982.
Friedman, J., T. Hastie, and R. Tibshirani (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1–22.
George, E. and D. P. Foster (2000). Calibration and empirical bayes variable selection. Biometrika 87(4), 731–747.
Ghosh, J. K. and R. Ramamoorthi (2003). Bayesian nonparametrics. Springer.
Greenshtein, E. and Y. Ritov (2009). Asymptotic efficiency of simple decisions for the compound decision problem. Lecture Notes-Monograph Series, 266–275.
Haley, C. and S. Knott (1992). A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315–324.
Hannan, J. F. and H. Robbins (1955). Asymptotic solutions of the compound decision problem for two completely specified distributions. The Annals of Mathematical Statistics, 37–51.
Hoerl, A. E. and R. W. Kennard (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67.
Jiang, W., M. Bogdan, J. Josse, B. Miasojedow, V. Rockova, and T. Group (2019). Adaptive bayesian slope–highdimensional model selection with missing values. arXiv preprint arXiv:1909.06631.
Kang, H. M., J. H. Sul, S. K. Service, N. A. Zaitlen, S. Y. Kong, N. B. Freimer, C. Sabatti, and E. Eskin (2010). Variance component model to account for sample structure in genome- wide association studies. Nature Genetics 42, 348–354.
Kim, Y., W. Wang, P. Carbonetto, and M. Stephens (2022). A flexible empirical bayes approach to multiple linear regression and connections with penalized regression. arXiv preprint arXiv:2208.10910.
Lavine, M. (1992). Some aspects of polya tree distributions for statistical modelling. The annals of statistics, 1222–1235.
Lavine, M. (1994). More aspects of polya tree distributions for statistical modelling. The Annals of Statistics 22(3),
Lindley, D. V. and A. F. Smith (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society: Series B (Methodological) 34(1), 1–18.
Piepho, H.-P. (2009). Ridge regression and extensions for genomewide selection in maize. Crop Science 49, 1165–1176.
Price, A. L., N. J. Patterson, D. Hancks, S. Myers, D. Reich, V. G. Cheung, and R. S. Spielman (2008). Effects of cis and trans genetic ancestry on gene expression in african americans. PLOS Genetics 4(12), e1000294.
Robbins, H. (1951). Asymptotically subminimax solutions of compound statistical decision problems. In Proceedings of the second Berkeley symposium on mathematical statistics and probability, pp. 131–149. University of California Press.
Robbins, H. (1956). An empirical bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, Berkeley, California, pp. 157–163. University of California Press.
Robbins, H. (1963). The empirical bayes approach to testing statistical hypotheses. Revue de l’Institut International de Statistique, 195–208.
Roˇckov´a, V. and E. I. George (2014). Emvs: The em approach to bayesian variable selection. Journal of the American Statistical Association 109(506), 828–846.
Roˇckov´a, V. and E. I. George (2018). The spike-and-slab lasso. Journal of the American Statistical Association 113(521), 431–444.
Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. The annals of Statistics, 1135–1151.
Sur, P. and E. J. Cand`es (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences 116(29), 14516–14525.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288. widespread selection on standing variation in europe at height-associated snps. Nature Genetics 44, 1015–1019.
Vilhj´almsson, B. and M. Nordborg (2013). The nature of confounding in genome-wide association studies. Nature Reviews Genetics 14, 1–2.
Visscher, P. M. and C. S. Haley (1996). Detection of quantitative trait loci in line crosses under infinitesimal genetic models. Theoretical and Applied Genetics 93, 691–702.
Wallin, J., M. Bogdan, P. A. Szulc, R. W. Doerge, and D. O. Siegmund (2021, 01). Ghost QTL and hotspots in experimental crosses: novel approach for modeling polygenic effects. Genetics 217(3). iyaa041.
Wang, G., A. Sarkar, P. Carbonetto, and M. Stephens (2020). A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(5), 1273–1300.
Yuan, M. and Y. Lin (2005). Efficient empirical bayes variable selection and estimation in linear models. Journal of the American Statistical Association 100(472), 1215–1225.
Zeng, Z., J. Liu, L. Stam, C. Kao, J. Mercer, and C. Laurie (2000). Genetic architecture of a morphological shape difference between two drosophila species. Genetics 154(1), 299–310.
Zhang, C.-H. (2003). Compound decision theory and empirical bayes methods. Annals of Statistics, 379–390.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38(2), 894 – 942.
Zhang, Y. D., B. P. Naughton, H. D. Bondell, and B. J. Reich (2022). Bayesian regression using a prior on the model fit: The r2-d2 shrinkage prior. Journal of the American Statistical Association 117(538), 862–874.

Acknowledgments

A.W. was supported by the Israeli Science Foundation (ISF) under grant no. 2679/24.

M.B. and J.W. were supported by the Swedish Research Council under grant no. 2020-05081.

Supplementary Materials

The supplement includes proof, and details on the Gibbs sampling algorithm.

Supplementary materials are available for download.

[1] Aguilar, J. E. and P.-C. B¨urkner (2023). Intuitive joint priors for bayesian linear multilevel models: The r2d2m2 prior. Electronic Journal of Statistics 17(1), 1711–1767.

[2] Andersen, M. R., A. Vehtari, O. Winther, and L. K. Hansen (2017). Bayesian inference for spatio-temporal spikeand-slab priors. Journal of Machine Learning Research 18(139), 1–58.

[3] Antoniak, C. E. (1974). Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The annals of statistics, 1152–1174.

[4] Berger, J. O. (2013). Statistical decision theory and Bayesian analysis. Springer Science & Business Media.

[5] Berry, D. A. and R. Christensen (1979). Empirical bayes estimation of a binomial parameter via mixtures of dirichlet processes. The Annals of Statistics 7(3), 558–568.

[6] Bogdan, M., F. Frommlet, P. Biecek, R. Cheng, J. K. Ghosh, and R. W. Doerge (2008). Extending the modified bayesian information criterion (mbic) to dense markers and multiple interval mapping. Biometrics 64(4), 1162– 1169.

[7] Bogdan, M., E. Van Den Berg, C. Sabatti, W. Su, and E. J. Cand`es (2015). Slope—adaptive variable selection via convex optimization. The annals of applied statistics 9(3), 1103.

[8] Bondell, H. and B. Reich (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics 64(1), 115–123.

[9] Brown, L. D. and E. Greenshtein (2009). Nonparametric empirical bayes and compound decision approaches to

[10] B¨urkner, P.-C. (2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software 100(5), 1–54.

[11] Carbonetto, P., X. Zhou, and M. Stephens (2017). varbvs: Fast variable selection for large-scale regression. arXiv preprint arXiv:1709.06597.

[12] Carlin, B. P. and T. A. Louis (1997). Bayes and empirical bayes methods for data analysis.

[13] Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480.

[14] Castillo, I. (2017). P´olya tree posterior distributions on densities. In Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, Volume 53, pp. 2074–2102. Institut Henri Poincar´e.

[15] Diaconis, P. and D. Freedman (1980). Finite exchangeable sequences. The Annals of Probability, 745–764.

[16] Efron, B. and C. Morris (1973). Stein’s estimation rule and its competitors—an empirical bayes approach. Journal of the American Statistical Association 68(341), 117–130.

[17] Endelman, J. B. (2011). Ridge regression and other kernels for genomic selection with r package rrblup. The Plant Genome 4, 250–255.

[18] Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.

[19] Ferguson, T. S. (1973). A bayesian analysis of some nonparametric problems. The annals of statistics, 209–230.

[20] Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. The annals of statistics 2(4), 615–629.

[21] Fraser, H. B., T. Babak, J. Tsang, Y. Zhou, B. Zhang, M. Mehrabian, and E. E. Schadt (2011). Systematic detection of polygenic cis-regulatory evolution. PLOS Genetics 7(3), e1002023.

[22] Fraser, H. B., A. Moses, and E. E. Schadt (2010). Evidence for widespread adaptive evolution of gene expression in budding yeast. Proceedings of the National Academy of Sciences of the United States of America 107, 2977–2982.

[23] Friedman, J., T. Hastie, and R. Tibshirani (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1–22.

[24] George, E. and D. P. Foster (2000). Calibration and empirical bayes variable selection. Biometrika 87(4), 731–747.

[25] Ghosh, J. K. and R. Ramamoorthi (2003). Bayesian nonparametrics. Springer.

[26] Greenshtein, E. and Y. Ritov (2009). Asymptotic efficiency of simple decisions for the compound decision problem. Lecture Notes-Monograph Series, 266–275.

[27] Haley, C. and S. Knott (1992). A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315–324.

[28] Hannan, J. F. and H. Robbins (1955). Asymptotic solutions of the compound decision problem for two completely specified distributions. The Annals of Mathematical Statistics, 37–51.

[29] Hoerl, A. E. and R. W. Kennard (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67.

[30] Jiang, W., M. Bogdan, J. Josse, B. Miasojedow, V. Rockova, and T. Group (2019). Adaptive bayesian slope–highdimensional model selection with missing values. arXiv preprint arXiv:1909.06631.

[31] Kang, H. M., J. H. Sul, S. K. Service, N. A. Zaitlen, S. Y. Kong, N. B. Freimer, C. Sabatti, and E. Eskin (2010). Variance component model to account for sample structure in genome- wide association studies. Nature Genetics 42, 348–354.

[32] Kim, Y., W. Wang, P. Carbonetto, and M. Stephens (2022). A flexible empirical bayes approach to multiple linear regression and connections with penalized regression. arXiv preprint arXiv:2208.10910.

[33] Lavine, M. (1992). Some aspects of polya tree distributions for statistical modelling. The annals of statistics, 1222–1235.

[34] Lavine, M. (1994). More aspects of polya tree distributions for statistical modelling. The Annals of Statistics 22(3),

[35] Lindley, D. V. and A. F. Smith (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society: Series B (Methodological) 34(1), 1–18.

[36] Piepho, H.-P. (2009). Ridge regression and extensions for genomewide selection in maize. Crop Science 49, 1165–1176.

[37] Price, A. L., N. J. Patterson, D. Hancks, S. Myers, D. Reich, V. G. Cheung, and R. S. Spielman (2008). Effects of cis and trans genetic ancestry on gene expression in african americans. PLOS Genetics 4(12), e1000294.

[38] Robbins, H. (1951). Asymptotically subminimax solutions of compound statistical decision problems. In Proceedings of the second Berkeley symposium on mathematical statistics and probability, pp. 131–149. University of California Press.

[39] Robbins, H. (1956). An empirical bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, Berkeley, California, pp. 157–163. University of California Press.

[40] Robbins, H. (1963). The empirical bayes approach to testing statistical hypotheses. Revue de l’Institut International de Statistique, 195–208.

[41] Roˇckov´a, V. and E. I. George (2014). Emvs: The em approach to bayesian variable selection. Journal of the American Statistical Association 109(506), 828–846.

[42] Roˇckov´a, V. and E. I. George (2018). The spike-and-slab lasso. Journal of the American Statistical Association 113(521), 431–444.

[43] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. The annals of Statistics, 1135–1151.

[44] Sur, P. and E. J. Cand`es (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences 116(29), 14516–14525.

[45] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288. widespread selection on standing variation in europe at height-associated snps. Nature Genetics 44, 1015–1019.

[46] Vilhj´almsson, B. and M. Nordborg (2013). The nature of confounding in genome-wide association studies. Nature Reviews Genetics 14, 1–2.

[47] Visscher, P. M. and C. S. Haley (1996). Detection of quantitative trait loci in line crosses under infinitesimal genetic models. Theoretical and Applied Genetics 93, 691–702.

[48] Wallin, J., M. Bogdan, P. A. Szulc, R. W. Doerge, and D. O. Siegmund (2021, 01). Ghost QTL and hotspots in experimental crosses: novel approach for modeling polygenic effects. Genetics 217(3). iyaa041.

[49] Wang, G., A. Sarkar, P. Carbonetto, and M. Stephens (2020). A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(5), 1273–1300.

[50] Yuan, M. and Y. Lin (2005). Efficient empirical bayes variable selection and estimation in linear models. Journal of the American Statistical Association 100(472), 1215–1225.

[51] Zeng, Z., J. Liu, L. Stam, C. Kao, J. Mercer, and C. Laurie (2000). Genetic architecture of a morphological shape difference between two drosophila species. Genetics 154(1), 299–310.

[52] Zhang, C.-H. (2003). Compound decision theory and empirical bayes methods. Annals of Statistics, 379–390.

[53] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38(2), 894 – 942.

[54] Zhang, Y. D., B. P. Naughton, H. D. Bondell, and B. J. Reich (2022). Bayesian regression using a prior on the model fit: The r2-d2 shrinkage prior. Journal of the American Statistical Association 117(538), 862–874.