Transfer Learning for Ridge Regression with Random Coefficients

Hongzhe Zhang and Hongzhe Li

doi:10.5705/ss.202025.0232

Abstract

Ridge regression with random coefficients provides a flexible approach for modeling many small but

nonzero effects in high-dimensional data. We embed this framework in transfer learning by leveraging source

samples from related regression models: the informativeness of each source is captured via the correlation between

its coefficients and those of the target. We propose two weighted estimators—one minimizing estimation risk and

the other minimizing prediction risk—each formed as an optimal blend of target and source ridge estimates. Under

the high-dimensional regime p/n →γ, where p is the number of the predictors and n is the sample size, random

matrix theory yields closed-form limits for these optimal weights and their associated risks. Through simulations

and applications to lipid-trait and colorectal-cancer microbiome prediction, our methods consistently outperform

both target-only and pooled-data ridge regression.

Key words and phrases: Covariate shift; estimation risk; prediction risk; random matrix theory 1

Information

Preprint No.	SS-2025-0232
Manuscript ID	SS-2025-0232
Complete Authors	Hongzhe Zhang, Hongzhe Li
Corresponding Authors	Hongzhe Li
Emails	hongzhe@upenn.edu

References

Daum´e III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 256–263.
Dobriban, E. and S. Wager (2018). High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics 46(1), 247–279.
Duchi, J. C. and H. Namkoong (2021). Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics 49(3), 1378–1406.
Duvallet, C., S. M. Gibbons, T. Gurry, R. A. Irizarry, and E. J. Alm (2017). Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nature communications 8(1), 1784.
Faquih, T., A. van Hylckama Vlieg, P. Surendran, A. S. Butterworth, R. Li-Gao, R. de Mutsert,
F. R. Rosendaa, R. Noordam, D. van Heemst, K. W. van Dijk, and D. O. Mook-Kanamori (2023). Robust metabolomic age prediction based on a wide selection of metabolites. medRxiv.
Ge, J., S. Tang, J. Fan, C. Ma, and C. Jin (2023). Maximum likelihood estimation is all you need for well-specified covariate shift. arXiv preprint arXiv:2311.15961.
Hachem, W., P. Loubaton, and J. Najim (2007). Deterministic equivalents for certain functionals of large random matrices.
Hu, Y., M. Li, Q. Lu, et al. (2019). A statistical framework for cross-tissue transcriptome-wide association analysis. Nature genetics 51(3), 568–576.
Lee, S., J. Yang, M. Goddard, P. Visscher, and N. Wray (2012). Estimation of pleiotropy between complex diseases using snp-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28(19), 2540–2542.
Li, S., T. T. Cai, and H. Li (2022). Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. Journal of Royal Statistical Society, series B.
Marotta, F., R. Mozafari, E. Grassi, A. Lussana, E. Mariella, and P. Provero (2021). Prediction of gene expression from regulatory sequence composition enhances transcriptome-wide association studies. bioRxiv. M´arquez-Luna, C., P.-R. Loh, S. A. T. . D. S. Consortium, S. T. . D. Consortium, and A. L.
Price (2017). Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genetic epidemiology 41(8), 811–823.
Mei, S., W. Fei, and S. Zhou (2011). Gene ontology based transfer learning for protein subcellular localization. BMC bioinformatics 12, 44.
Pan, W. and Q. Yang (2013). Transfer learning in heterogeneous collaborative filtering domains. Artificial intelligence 197, 39–55.
Rothschild, D., S. Leviatan, A. Hanemann, Y. Cohen, O. Weissbrod, and S. E (2022). An atlas of robust microbiome associations with phenotypic traits based on large-scale cohorts from two continents. PLoS ONE 17(3), e0265756.
Sheng, Y. and E. Dobriban (2020, 13–18 Jul). One-shot distributed ridge regression in high dimensions. In H. D. III and A. Singh (Eds.), Proceedings of the 37th International Conference on
Machine Learning, Volume 119 of Proceedings of Machine Learning Research, pp. 8763–8772. PMLR.
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90(2), 227–244.
Shin, H.-C., H. R. Roth, M. Gao, et al. (2016). Deep convolutional neural networks for computeraided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging 35(5), 1285–1298.
Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. Journal of Multivariate Analysis 55(2), 331–339.
Torrey, L. and J. Shavlik (2010). Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp. 242–264. IGI Global.
Turki, T., Z. Wei, and J. T. Wang (2017). Transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients. IEEE Access 5, 7381–7393.
Turley, P., R. Walters, O. Maghzian, A. Okbay, J. Lee, M. Fontana, T. Nguyen-Viet, R. Wedow, M. Zacher, N. Furlotte, 23andMe Research Team, S. S. G. A. Consortium, P. Magnusson, S. Oskarsson, M. Johannesson, P. Visscher, D. Laibson, D. Cesarini, B. Neale, and D. Benjamin
(2018). Multi-trait analysis of genome-wide association summary statistics using mtag. Nat Genet. 50, 229–237.
Wang, S., X. Shi, M. Wu, and S. Ma (2019). Horizontal and vertical integrative analysis methods for mental disorders omics data. Scientific Reports, 1–12.
Zhao, B. and H. Zhu (2019). Cross-trait prediction accuracy of high-dimensional ridge-type estimators in genome-wide association studies. arXiv preprint arXiv:1911.10142.
Zhao, Z., L. G. Fritsche, J. A. Smith, B. Mukherjee, and S. Lee (2022). The construction of cross-population polygenic risk scores using transfer learning. The American Journal of Human Genetics 109(11), 1998–2008.
Zhou, X., H. Im, and S. Lee (2020). Core greml for estimating covariance between random effects in linear mixed models for complex trait analyses. Nature Communication 11, 4208.

Acknowledgments

We would like to thank Dr. Jiaoyang Huang and Dr. Edgar Dobriban for discussions on random

matrix theorems in the derivations. H.L.’s research is supported partially by NIH grants GM123056

and GM129781.

Supplementary Materials

available online include details of additional lemmas and corollaries, the

proofs of all the lemmas, corollaries and theorems, and parameter estimation for real data analysis.

Supplementary materials are available for download.

[1] Daum´e III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 256–263.

[2] Dobriban, E. and S. Wager (2018). High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics 46(1), 247–279.

[3] Duchi, J. C. and H. Namkoong (2021). Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics 49(3), 1378–1406.

[4] Duvallet, C., S. M. Gibbons, T. Gurry, R. A. Irizarry, and E. J. Alm (2017). Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nature communications 8(1), 1784.

[5] Faquih, T., A. van Hylckama Vlieg, P. Surendran, A. S. Butterworth, R. Li-Gao, R. de Mutsert,

[6] F. R. Rosendaa, R. Noordam, D. van Heemst, K. W. van Dijk, and D. O. Mook-Kanamori (2023). Robust metabolomic age prediction based on a wide selection of metabolites. medRxiv.

[7] Ge, J., S. Tang, J. Fan, C. Ma, and C. Jin (2023). Maximum likelihood estimation is all you need for well-specified covariate shift. arXiv preprint arXiv:2311.15961.

[8] Hachem, W., P. Loubaton, and J. Najim (2007). Deterministic equivalents for certain functionals of large random matrices.

[9] Hu, Y., M. Li, Q. Lu, et al. (2019). A statistical framework for cross-tissue transcriptome-wide association analysis. Nature genetics 51(3), 568–576.

[10] Lee, S., J. Yang, M. Goddard, P. Visscher, and N. Wray (2012). Estimation of pleiotropy between complex diseases using snp-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28(19), 2540–2542.

[11] Li, S., T. T. Cai, and H. Li (2022). Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. Journal of Royal Statistical Society, series B.

[12] Marotta, F., R. Mozafari, E. Grassi, A. Lussana, E. Mariella, and P. Provero (2021). Prediction of gene expression from regulatory sequence composition enhances transcriptome-wide association studies. bioRxiv. M´arquez-Luna, C., P.-R. Loh, S. A. T. . D. S. Consortium, S. T. . D. Consortium, and A. L.

[13] Price (2017). Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genetic epidemiology 41(8), 811–823.

[14] Mei, S., W. Fei, and S. Zhou (2011). Gene ontology based transfer learning for protein subcellular localization. BMC bioinformatics 12, 44.

[15] Pan, W. and Q. Yang (2013). Transfer learning in heterogeneous collaborative filtering domains. Artificial intelligence 197, 39–55.

[16] Rothschild, D., S. Leviatan, A. Hanemann, Y. Cohen, O. Weissbrod, and S. E (2022). An atlas of robust microbiome associations with phenotypic traits based on large-scale cohorts from two continents. PLoS ONE 17(3), e0265756.

[17] Sheng, Y. and E. Dobriban (2020, 13–18 Jul). One-shot distributed ridge regression in high dimensions. In H. D. III and A. Singh (Eds.), Proceedings of the 37th International Conference on

[18] Machine Learning, Volume 119 of Proceedings of Machine Learning Research, pp. 8763–8772. PMLR.

[19] Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90(2), 227–244.

[20] Shin, H.-C., H. R. Roth, M. Gao, et al. (2016). Deep convolutional neural networks for computeraided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging 35(5), 1285–1298.

[21] Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. Journal of Multivariate Analysis 55(2), 331–339.

[22] Torrey, L. and J. Shavlik (2010). Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp. 242–264. IGI Global.

[23] Turki, T., Z. Wei, and J. T. Wang (2017). Transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients. IEEE Access 5, 7381–7393.

[24] Turley, P., R. Walters, O. Maghzian, A. Okbay, J. Lee, M. Fontana, T. Nguyen-Viet, R. Wedow, M. Zacher, N. Furlotte, 23andMe Research Team, S. S. G. A. Consortium, P. Magnusson, S. Oskarsson, M. Johannesson, P. Visscher, D. Laibson, D. Cesarini, B. Neale, and D. Benjamin

[25] (2018). Multi-trait analysis of genome-wide association summary statistics using mtag. Nat Genet. 50, 229–237.

[26] Wang, S., X. Shi, M. Wu, and S. Ma (2019). Horizontal and vertical integrative analysis methods for mental disorders omics data. Scientific Reports, 1–12.

[27] Zhao, B. and H. Zhu (2019). Cross-trait prediction accuracy of high-dimensional ridge-type estimators in genome-wide association studies. arXiv preprint arXiv:1911.10142.

[28] Zhao, Z., L. G. Fritsche, J. A. Smith, B. Mukherjee, and S. Lee (2022). The construction of cross-population polygenic risk scores using transfer learning. The American Journal of Human Genetics 109(11), 1998–2008.

[29] Zhou, X., H. Im, and S. Lee (2020). Core greml for estimating covariance between random effects in linear mixed models for complex trait analyses. Nature Communication 11, 4208.