An Ising Similarity Regression Model for Modeling Multivariate Binary Data

Zhi Yang Tho, Francis K. C. Hui and Tao Zou

doi:10.5705/ss.202024.0021

Abstract

Understanding the dependence structure between response variables is an im

portant component in the analysis of correlated multivariate data. This article focuses on

modeling dependence structures in multivariate binary data, motivated by a study aiming

to understand how patterns in different U.S. senators’ votes are determined by similarities (or lack thereof) in their attributes, e.g., political parties and social network profiles.

To address such a research question, we propose a new Ising similarity regression model

which regresses pairwise interaction coefficients in the Ising model against a set of similarity measures available/constructed from covariates. Model selection approaches are fur-

ther developed through regularizing the pseudo-likelihood function with an adaptive lasso

penalty to enable the selection of relevant similarity measures. We establish estimation and

selection consistency of the proposed estimator under a general setting where the number

of similarity measures and responses tend to infinity. Simulation study demonstrates the

strong finite sample performance of the proposed estimator, particularly compared with

several existing Ising model estimators in estimating the matrix of pairwise interaction coefficients. Applying the Ising similarity regression model to a dataset of roll call voting

records of 100 U.S. senators, we are able to quantify how similarities in senators’ parties,

businessman occupations and social network profiles drive their voting associations.

Key words and phrases: Conditional dependence, Ising model, Lasso, Model selection, Multivariate data, Pseudo-likelihood 1

Information

Preprint No.	SS-2024-0021
Manuscript ID	SS-2024-0021
Complete Authors	Zhi Yang Tho, Francis K. C. Hui, Tao Zou
Corresponding Authors	Zhi Yang Tho
Emails	zhiyang.tho@anu.edu.au

References

Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pp. 199–213. Springer.
Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. The Annals of Statistics 1, 135–141.
Banerjee, O., L. El Ghaoui, and A. d’ Aspremont (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. Journal of Machine Learning Research 9, 485–516.
Bhattacharya, B. B. and S. Mukherjee (2018). Inference in Ising models. Bernoulli 24, 493–525.
Bonat, W. H. and B. Jørgensen (2016). Multivariate covariance generalized linear models. Journal of the Royal Statistical Society: Series C (Applied Statistics) 65, 649–675.
Cheng, J., E. Levina, P. Wang, and J. Zhu (2014). A sparse Ising model with covariates. Biometrics 70, 943–953.
Cohen, J., P. Cohen, S. West, and L. Aiken (2013). Applied multiple regression/correlation analysis for the behavioral sciences. Taylor & Francis.
Fan, Y. and C. Y. Tang (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 75, 531–552.
Friedman, J., T. Hastie, and R. Tibshirani (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.
Friedman, J., T. Hastie, and R. Tibshirani (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22.
Gourieroux, C., A. Monfort, and A. Trognon (1984). Pseudo maximum likelihood methods: Theory. Econometrica 52, 681–700.
Guo, J., J. Cheng, E. Levina, G. Michailidis, and J. Zhu (2015). Estimating heterogeneous graphical models for discrete data with an application to roll call voting. The Annals of Applied Statistics 9, 821–848.
Guo, J., E. Levina, G. Michailidis, and J. Zhu (2010). Joint structure estimation for categorical Markov networks. Unpublished manuscript, https://www.researchgate.net/ publication/266584068_Joint_Structure_Estimation_for_Categorical_ Markov_Networks.
Hammersley, J. M. and P. Clifford (1971). Markov fields on finite graphs and lattices. Unpublished manuscript, http://www.statslab.cam.ac.uk/˜grg/books/hammfest/ hamm-cliff.pdf.
Hastie, T., R. Tibshirani, and M. Wainwright (2015). Statistical learning with sparsity: The lasso and generalizations. CRC Press.
H¨ofling, H. and R. Tibshirani (2009). Estimation of sparse binary pairwise Markov networks using pseudolikelihoods. Journal of Machine Learning Research 10, 883–906.
Huang, J., S. Ma, and C.-H. Zhang (2008). Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica 18, 1603–1618.
Hui, F. K. C., S. M¨uller, and A. H. Welsh (2017a). Hierarchical selection of fixed and random effects in generalized linear mixed models. Statistica Sinica 27, 501–518.
Hui, F. K. C., S. M¨uller, and A. H. Welsh (2017b). Joint selection in mixed models using regularized PQL.
Journal of the American Statistical Association 112, 1323–1333.
Hui, F. K. C., S. M¨uller, and A. H. Welsh (2018). Sparse pairwise likelihood estimation for multivariate longitudinal mixed models. Journal of the American Statistical Association 113, 1759–1769.
Hui, F. K. C., S. M¨uller, and A. H. Welsh (2023). GEE-assisted variable selection for latent variable models with multivariate binary data. Journal of the American Statistical Association 118, 1252–1263.
Ising, E. (1925). Beitrag zur theorie der ferromagnetismus. Zeitschrift f¨ur Physik 31, 253–258.
Johnson, R. A. and D. W. Wichern (1992). Applied multivariate statistical analysis. Prentice Hall.
Lee, K. H. and L. Xue (2018). Nonparametric finite mixture of Gaussian graphical models. Technometrics 60, 511–521.
Lee, S.-I., V. Ganapathi, and D. Koller (2006). Efficient structure learning of Markov networks using L1regularization. In B. Sch¨olkopf, J. Platt, and T. Hoffman (Eds.), Advances in Neural Information Processing Systems, Volume 19. MIT Press.
Liu, H., X. Chen, L. Wasserman, and J. Lafferty (2010). Graph-valued regression. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (Eds.), Advances in Neural Information Processing Systems, Volume 23. Curran Associates, Inc.
Majewski, J., H. Li, and J. Ott (2001). The Ising model in physics and statistical genetics. The American Journal of Human Genetics 69, 853–862.
McElroy, T. S. and T. Trimbur (2023). Variable targeting and reduction in large vector autoregressions with applications to workforce indicators. Journal of Applied Statistics 50, 1515–1537.
Meinshausen, N. and P. B¨uhlmann (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, 1436–1462.
Nghiem, L. H., F. K. C. Hui, S. M¨uller, and A. H. Welsh (2022). Sparse sliced inverse regression via Cholesky matrix penalization. Statistica Sinica 32, 2431–2453.
Ni, Y., F. C. Stingo, and V. Baladandayuthapani (2022). Bayesian covariate-dependent Gaussian graphical models with varying structure. Journal of Machine Learning Research 23, 1–29.
Parlett, B. (1980). The symmetric eigenvalue problem. Prentice-Hall.
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86, 677–690.
Ravikumar, P., M. J. Wainwright, and J. D. Lafferty (2010). High-dimensional Ising model selection using l1-regularized logistic regression. The Annals of Statistics 38, 1287–1319.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464.
Tsay, R. (2013). Multivariate time series analysis: With R and financial applications. Wiley.
Wainwright, M. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press.
Wang, H., B. Li, and C. Leng (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 671–683.
Wang, Z., V. Baladandayuthapani, A. O. Kaseb, H. M. Amin, M. M. Hassan, W. Wang, and J. S. Morris
(2022). Bayesian edge regression in undirected graphical models to characterize interpatient heterogeneity in cancer. Journal of the American Statistical Association 117, 533–546.
Warton, D. I., L. Thibaut, and Y. A. Wang (2017). The PIT-trap – A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses. PloS one 12, e0181790.
Whittaker, J. (1990). Graphical models in applied multivariate statistics. Wiley.
Xue, L., H. Zou, and T. Cai (2012). Nonconcave penalized composite conditional likelihood estimator of sparse Ising models. The Annals of Statistics 40, 1403–1429.
Yuan, M. and Y. Lin (2007). Model selection and estimation in the gaussian graphical model. Biometrika 94, 19–35.
Zhang, X., F. Huang, F. K. C. Hui, and S. Haberman (2023). Cause-of-death mortality forecasting using adaptive penalized tensor decompositions. Insurance: Mathematics and Economics 111, 193–213.
Zhang, Y., R. Li, and C.-L. Tsai (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association 105, 312–323.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418–1429.
Zou, T., W. Lan, R. Li, and C.-L. Tsai (2022). Inference on covariance-mean regression. Journal of Econometrics 230, 318–338.
Zou, T., W. Lan, H. Wang, and C.-L. Tsai (2017). Covariance regression analysis. Journal of the American Statistical Association 112, 266–281.
Zou, T., R. Luo, W. Lan, and C.-L. Tsai (2020). Covariance regression model for non-normal data. In C. F. Lee and J. C. Lee (Eds.), Handbook of Financial Econometrics, Mathematics, Statistics, and Machine
Learning, Chapter 113, pp. 3933–3945. World Scientific. Zhi Yang Tho
The Australian National University, Canberra, ACT 2600, Australia.

Acknowledgments

Zhi Yang Tho was supported by an Australian Government Research Training

Program scholarship. Francis KC Hui was supported by an Australian Research

Council Discovery Project DP230101908. Tao Zou’s research was supported

by computational resources provided by the Australian Government through the

National Computational Infrastructure (NCI), under the ANU Startup Allocation

Scheme. Thanks to Alan Welsh for useful discussions.

Supplementary Materials

The Supplementary Material contains sample versions of Conditions 1 and 3,

proofs of the theorems, inference method, additional simulation results, along

with supplementary details of application to the U.S. Senate roll call voting data,

as well as an additional application to the Scotland Carabidae ground beetle data.

Supplementary materials are available for download.

[1] Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pp. 199–213. Springer.

[2] Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. The Annals of Statistics 1, 135–141.

[3] Banerjee, O., L. El Ghaoui, and A. d’ Aspremont (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. Journal of Machine Learning Research 9, 485–516.

[4] Bhattacharya, B. B. and S. Mukherjee (2018). Inference in Ising models. Bernoulli 24, 493–525.

[5] Bonat, W. H. and B. Jørgensen (2016). Multivariate covariance generalized linear models. Journal of the Royal Statistical Society: Series C (Applied Statistics) 65, 649–675.

[6] Cheng, J., E. Levina, P. Wang, and J. Zhu (2014). A sparse Ising model with covariates. Biometrics 70, 943–953.

[7] Cohen, J., P. Cohen, S. West, and L. Aiken (2013). Applied multiple regression/correlation analysis for the behavioral sciences. Taylor & Francis.

[8] Fan, Y. and C. Y. Tang (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 75, 531–552.

[9] Friedman, J., T. Hastie, and R. Tibshirani (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.

[10] Friedman, J., T. Hastie, and R. Tibshirani (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22.

[11] Gourieroux, C., A. Monfort, and A. Trognon (1984). Pseudo maximum likelihood methods: Theory. Econometrica 52, 681–700.

[12] Guo, J., J. Cheng, E. Levina, G. Michailidis, and J. Zhu (2015). Estimating heterogeneous graphical models for discrete data with an application to roll call voting. The Annals of Applied Statistics 9, 821–848.

[13] Guo, J., E. Levina, G. Michailidis, and J. Zhu (2010). Joint structure estimation for categorical Markov networks. Unpublished manuscript, https://www.researchgate.net/ publication/266584068_Joint_Structure_Estimation_for_Categorical_ Markov_Networks.

[14] Hammersley, J. M. and P. Clifford (1971). Markov fields on finite graphs and lattices. Unpublished manuscript, http://www.statslab.cam.ac.uk/˜grg/books/hammfest/ hamm-cliff.pdf.

[15] Hastie, T., R. Tibshirani, and M. Wainwright (2015). Statistical learning with sparsity: The lasso and generalizations. CRC Press.

[16] H¨ofling, H. and R. Tibshirani (2009). Estimation of sparse binary pairwise Markov networks using pseudolikelihoods. Journal of Machine Learning Research 10, 883–906.

[17] Huang, J., S. Ma, and C.-H. Zhang (2008). Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica 18, 1603–1618.

[18] Hui, F. K. C., S. M¨uller, and A. H. Welsh (2017a). Hierarchical selection of fixed and random effects in generalized linear mixed models. Statistica Sinica 27, 501–518.

[19] Hui, F. K. C., S. M¨uller, and A. H. Welsh (2017b). Joint selection in mixed models using regularized PQL.

[20] Journal of the American Statistical Association 112, 1323–1333.

[21] Hui, F. K. C., S. M¨uller, and A. H. Welsh (2018). Sparse pairwise likelihood estimation for multivariate longitudinal mixed models. Journal of the American Statistical Association 113, 1759–1769.

[22] Hui, F. K. C., S. M¨uller, and A. H. Welsh (2023). GEE-assisted variable selection for latent variable models with multivariate binary data. Journal of the American Statistical Association 118, 1252–1263.

[23] Ising, E. (1925). Beitrag zur theorie der ferromagnetismus. Zeitschrift f¨ur Physik 31, 253–258.

[24] Johnson, R. A. and D. W. Wichern (1992). Applied multivariate statistical analysis. Prentice Hall.

[25] Lee, K. H. and L. Xue (2018). Nonparametric finite mixture of Gaussian graphical models. Technometrics 60, 511–521.

[26] Lee, S.-I., V. Ganapathi, and D. Koller (2006). Efficient structure learning of Markov networks using L1regularization. In B. Sch¨olkopf, J. Platt, and T. Hoffman (Eds.), Advances in Neural Information Processing Systems, Volume 19. MIT Press.

[27] Liu, H., X. Chen, L. Wasserman, and J. Lafferty (2010). Graph-valued regression. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (Eds.), Advances in Neural Information Processing Systems, Volume 23. Curran Associates, Inc.

[28] Majewski, J., H. Li, and J. Ott (2001). The Ising model in physics and statistical genetics. The American Journal of Human Genetics 69, 853–862.

[29] McElroy, T. S. and T. Trimbur (2023). Variable targeting and reduction in large vector autoregressions with applications to workforce indicators. Journal of Applied Statistics 50, 1515–1537.

[30] Meinshausen, N. and P. B¨uhlmann (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, 1436–1462.

[31] Nghiem, L. H., F. K. C. Hui, S. M¨uller, and A. H. Welsh (2022). Sparse sliced inverse regression via Cholesky matrix penalization. Statistica Sinica 32, 2431–2453.

[32] Ni, Y., F. C. Stingo, and V. Baladandayuthapani (2022). Bayesian covariate-dependent Gaussian graphical models with varying structure. Journal of Machine Learning Research 23, 1–29.

[33] Parlett, B. (1980). The symmetric eigenvalue problem. Prentice-Hall.

[34] Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86, 677–690.

[35] Ravikumar, P., M. J. Wainwright, and J. D. Lafferty (2010). High-dimensional Ising model selection using l1-regularized logistic regression. The Annals of Statistics 38, 1287–1319.

[36] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464.

[37] Tsay, R. (2013). Multivariate time series analysis: With R and financial applications. Wiley.

[38] Wainwright, M. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press.

[39] Wang, H., B. Li, and C. Leng (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 671–683.

[40] Wang, Z., V. Baladandayuthapani, A. O. Kaseb, H. M. Amin, M. M. Hassan, W. Wang, and J. S. Morris

[41] (2022). Bayesian edge regression in undirected graphical models to characterize interpatient heterogeneity in cancer. Journal of the American Statistical Association 117, 533–546.

[42] Warton, D. I., L. Thibaut, and Y. A. Wang (2017). The PIT-trap – A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses. PloS one 12, e0181790.

[43] Whittaker, J. (1990). Graphical models in applied multivariate statistics. Wiley.

[44] Xue, L., H. Zou, and T. Cai (2012). Nonconcave penalized composite conditional likelihood estimator of sparse Ising models. The Annals of Statistics 40, 1403–1429.

[45] Yuan, M. and Y. Lin (2007). Model selection and estimation in the gaussian graphical model. Biometrika 94, 19–35.

[46] Zhang, X., F. Huang, F. K. C. Hui, and S. Haberman (2023). Cause-of-death mortality forecasting using adaptive penalized tensor decompositions. Insurance: Mathematics and Economics 111, 193–213.

[47] Zhang, Y., R. Li, and C.-L. Tsai (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association 105, 312–323.

[48] Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418–1429.

[49] Zou, T., W. Lan, R. Li, and C.-L. Tsai (2022). Inference on covariance-mean regression. Journal of Econometrics 230, 318–338.

[50] Zou, T., W. Lan, H. Wang, and C.-L. Tsai (2017). Covariance regression analysis. Journal of the American Statistical Association 112, 266–281.

[51] Zou, T., R. Luo, W. Lan, and C.-L. Tsai (2020). Covariance regression model for non-normal data. In C. F. Lee and J. C. Lee (Eds.), Handbook of Financial Econometrics, Mathematics, Statistics, and Machine

[52] Learning, Chapter 113, pp. 3933–3945. World Scientific. Zhi Yang Tho

[53] The Australian National University, Canberra, ACT 2600, Australia.