Efficient Learning of DAG Structures in Heavy-tailed Data

Wei Zhou, Xueqian Kang, Wei Zhong and Junhui Wang

doi:10.5705/ss.202024.0199

Abstract

Directed acyclic graph (DAG) models are widely used to discover causal relationships

among random variables. However, most existing DAG learning algorithms are not directly applicable to heavy-tailed data which are commonly observed in finance and other

fields. In this article, we propose a two-step efficient algorithm based on topological layers, referred as TopHeat, to learn linear DAGs with heavy-tailed error distributions which

include Pareto, Fr´echet, log-normal, Cauchy distributions, and so on. First, we reconstruct

the topological layers hierarchically in a top-down fashion based on the new reconstruction

criteria for heavy-tailed DAGs without assuming the popularly-employed faithfulness condition. Second, we recover the directed edges via the modified conditional independence

testing for heavy-tailed distributions. We theoretically demonstrate the consistency of the

exact DAG structures. Monte Carlo simulations validate the outstanding finite-sample performance of the proposed algorithm compared with competing methods. In the real data

analysis, we analyze the exchange rates among 17 countries and uncover the source of financial contagion and the pathways, which indicates that the financial risk contagion effect

became increasingly stable among European countries as the euro was introduced.

Key words and phrases: Causality, exact DAG structures, heavy-tailed data, topological layers, conditional independence testing

Information

Preprint No.	SS-2024-0199
Manuscript ID	SS-2024-0199
Complete Authors	Wei Zhou, Xueqian Kang, Wei Zhong, Junhui Wang
Corresponding Authors	Xueqian Kang
Emails	kangxueqian@stu.xmu.edu.cn

References

Asadi, P., A. C. Davison, and S. Engelke (2015). Extremes on river networks. The Annals of Applied Statistics 9(4), 2023–2050.
Azadkia, M. and S. Chatterjee (2021). A simple measure of conditional dependence. The Annals of Statistics 49(6), 3070–3102.
Barab´asi, A. and R. Albert (1999). Emergence of scaling in random networks. Science 286(5349), 509–512.
Cai, J., J. H. J. Einmahl, L. De Haan, and C. Zhou (2015). Estimation of the marginal expected shortfall: the mean when a related variable is extreme. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77(2), 417–442.
Candes, E., Y. Fan, L. Janson, and J. Lv (2018). Panning for gold: ‘model-x’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology 80(3), 551–577.
Chen, S. and M. Schienle (2022). Large spillover networks of nonstationary systems. Journal of Business and Economic Statistics 42(2), 422–436.
Chickering, D. W. (2003). Optimal structure identification with greedy search. The Journal of Machine Learning Research 3(3), 507–554.
Daouia, A., S. Girard, and G. Stupfler (2018). Estimation of tail risk based on extreme expectiles. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(2), 263–292.
De Haan, L. and A. Ferreira (2006). Extreme Value Theory: An Introduction. New York: Springer.
Erd¨os, P. and A. R´enyi (1960). On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–60.
Fan, J., Q. Li, and Y. Wang (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society Series B: Statistical Methodology 79(1), 247–265.
Gao, M., Y. Ding, and B. Aragam (2020). A polynomial-time algorithm for learning nonparametric causal graphs. In Advances in Neural Information Processing Systems, Volume 33, pp. 11599–11611.
Gnecco, N., N. Meinshausen, J. Peters, and S. Engelke (2021). Causal discovery in heavy-tailed models. The Annals of Statistics 49(3), 1755–1778.
Harris, N. and M. Drton (2013). PC algorithm for nonparanormal graphical models. The Journal of Machine Learning Research 14(11), 3365–3383.
Hyv¨arinen, A. and S. M. Smith (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. The Journal of Machine Learning Research 14(1), 111–152.
Kalisch, M. and P. B¨uhlmann (2007). Estimating high-dimensional directed acyclic graphs with the PCalgorithm. The Journal of Machine Learning Research 8(22), 613–636.
Kalisch, M., M. M¨achler, D. Colombo, M. H. Maathuis, and P. B¨uhlmann (2012). Causal inference using graphical models with the R package pcalg. Journal of Statistical Software 47(11), 1–26.
Kl¨uppelberg, C. and M. Krali (2021). Estimating an extreme Bayesian network via scalings. Journal of Multivariate Analysis 181(C), 104672.
Lee, B. S. (1992). Causal relations among stock returns, interest rates, real activity, and inflation. The Journal of Finance 47(4), 1591–1603.
Li, J. and Q. Tang (2015). Interplay of insurance and financial risks in a discrete-time model with strongly regular variation. Bernoulli 21(3), 1800–1823.
Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge, UK: Cambridge University Press.
Peng, L. and Y. Qi (2017). Inference for Heavy-Tailed Data: Applications in Insurance and Finance.
Cambridge, Massachusetts: Academic Press.
Peters, J. and P. B¨uhlmann (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1), 219–228.
Peters, J., D. Janzing, and B. Sch¨olkopf (2017). Elements of Causal Inference: Foundations and Learning Algorithms. Cambridge, Massachusetts: The MIT Press.
Resnick, S. I. (2007). Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Berlin, Germany: Springer Science & Business Media.
Shah, R. D. and J. Peters (2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics 48(3), 1514–1538.
Shi, H., M. Drton, and F. Han (2024). On Azadkia–Chatterjee’s conditional dependence coefficient. Bernoulli 30(2), 851–877.
Shimizu, S., P. O. Hoyer, A. Hyv¨arinen, and A. J. Kerminen (2006). A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research 7(72), 2003–2030.
Shimizu, S., T. Inazumi, Y. Sogawa, A. Hyv¨arinen, Y. Kawahara, T. Washio, P. O. Hoyer, and K. Bollen
(2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. The Journal of Machine Learning Research 12(33), 1225–1248.
Shojaie, A. and G. Michailidis (2010). Penalized likelihood methods for estimation of sparse highdimensional directed acyclic graphs. Biometrika 97(3), 519–538.
Spirtes, P., C. N. Glymour, and R. Scheines (2000).
Causation, Prediction, and Search. Cambridge, Massachusetts: MIT Press.
Sun, Q., W. Zhou, and J. Fan (2020). Adaptive huber regression. Journal of the American Statistical Association 115(529), 254–265.
Sun, W., J. Wang, and Y. Fang (2013). Consistent selection of tuning parameters via variable selection stability. The Journal of Machine Learning Research 14(107), 3419–3440.
Wang, X., W. Pan, W. Hu, Y. Tian, and H. Zhang (2015). Conditional distance correlation. Journal of the American Statistical Association 110(512), 1726–1734.
Wang, Y. S. and M. Drton (2020). High-dimensional causal discovery under non-Gaussianity. Biometrika 107(1), 41–59.
Yang, J. and Y. Zhou (2013). Credit risk spillovers among financial institutions around the global credit crisis: Firm-level evidence. Management Science 59(10), 2343–2359.
Yang, Z. and Y. Zhou (2017). Quantitative easing and volatility spillovers across countries and asset classes. Management Science 63(2), 333–354.
Zhang, K., J. Peters, D. Janzing, and B. Sch¨olkopf (2011). Kernel-based conditional independence test and application in causal discovery. In 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 804–813. AUAI Press.
Zhao, R., X. He, and J. Wang (2022). Learning linear non-Gaussian directed acyclic graph with diverging number of nodes. The Journal of Machine Learning Research 23(269), 1–34.
Zhao, T. and H. Liu (2014). Calibrated precision matrix estimation for high-dimensional elliptical distributions. IEEE transactions on Information Theory 60(12), 7874–7887.
Zhou, W., X. He, W. Zhong, and J. Wang (2022). Efficient learning of quadratic variance function directed acyclic graphs via topological layers. Journal of Computational and Graphical Statistics 31(4), 1269– 1279. Joint Laboratory of Data Science and Business Intelligence, School of Statistics and Data Science, Southwestern University of Finance and Economics, China

Acknowledgments

The authors thank the editor, associate editor, and reviewers for their constructive

comments, which led to significant improvement in this work. The authors are

supported by National Key R&D Program of China (Grant No. 2022YFA1003800),

National Natural Science Foundation of China (Grant Nos. 12471265, 72495122,

12231011, 12501381, 72473114, and 71988101), HK RGC Grants GRF (11311022,

14306523, and 14303424), CUHK Startup Grant 4937091, and Sichuan Science

and Technology Program (2024NSFSC1393). Zhong also thanks the supports of

Fujian Key Lab of Statistics, Fujian Key lab of Digital Finance.

Supplementary Materials

The online Supplementary Material contains all the technical details and additional results.

Supplementary materials are available for download.

[1] Asadi, P., A. C. Davison, and S. Engelke (2015). Extremes on river networks. The Annals of Applied Statistics 9(4), 2023–2050.

[2] Azadkia, M. and S. Chatterjee (2021). A simple measure of conditional dependence. The Annals of Statistics 49(6), 3070–3102.

[3] Barab´asi, A. and R. Albert (1999). Emergence of scaling in random networks. Science 286(5349), 509–512.

[4] Cai, J., J. H. J. Einmahl, L. De Haan, and C. Zhou (2015). Estimation of the marginal expected shortfall: the mean when a related variable is extreme. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77(2), 417–442.

[5] Candes, E., Y. Fan, L. Janson, and J. Lv (2018). Panning for gold: ‘model-x’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology 80(3), 551–577.

[6] Chen, S. and M. Schienle (2022). Large spillover networks of nonstationary systems. Journal of Business and Economic Statistics 42(2), 422–436.

[7] Chickering, D. W. (2003). Optimal structure identification with greedy search. The Journal of Machine Learning Research 3(3), 507–554.

[8] Daouia, A., S. Girard, and G. Stupfler (2018). Estimation of tail risk based on extreme expectiles. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(2), 263–292.

[9] De Haan, L. and A. Ferreira (2006). Extreme Value Theory: An Introduction. New York: Springer.

[10] Erd¨os, P. and A. R´enyi (1960). On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–60.

[11] Fan, J., Q. Li, and Y. Wang (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society Series B: Statistical Methodology 79(1), 247–265.

[12] Gao, M., Y. Ding, and B. Aragam (2020). A polynomial-time algorithm for learning nonparametric causal graphs. In Advances in Neural Information Processing Systems, Volume 33, pp. 11599–11611.

[13] Gnecco, N., N. Meinshausen, J. Peters, and S. Engelke (2021). Causal discovery in heavy-tailed models. The Annals of Statistics 49(3), 1755–1778.

[14] Harris, N. and M. Drton (2013). PC algorithm for nonparanormal graphical models. The Journal of Machine Learning Research 14(11), 3365–3383.

[15] Hyv¨arinen, A. and S. M. Smith (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. The Journal of Machine Learning Research 14(1), 111–152.

[16] Kalisch, M. and P. B¨uhlmann (2007). Estimating high-dimensional directed acyclic graphs with the PCalgorithm. The Journal of Machine Learning Research 8(22), 613–636.

[17] Kalisch, M., M. M¨achler, D. Colombo, M. H. Maathuis, and P. B¨uhlmann (2012). Causal inference using graphical models with the R package pcalg. Journal of Statistical Software 47(11), 1–26.

[18] Kl¨uppelberg, C. and M. Krali (2021). Estimating an extreme Bayesian network via scalings. Journal of Multivariate Analysis 181(C), 104672.

[19] Lee, B. S. (1992). Causal relations among stock returns, interest rates, real activity, and inflation. The Journal of Finance 47(4), 1591–1603.

[20] Li, J. and Q. Tang (2015). Interplay of insurance and financial risks in a discrete-time model with strongly regular variation. Bernoulli 21(3), 1800–1823.

[21] Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge, UK: Cambridge University Press.

[22] Peng, L. and Y. Qi (2017). Inference for Heavy-Tailed Data: Applications in Insurance and Finance.

[23] Cambridge, Massachusetts: Academic Press.

[24] Peters, J. and P. B¨uhlmann (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1), 219–228.

[25] Peters, J., D. Janzing, and B. Sch¨olkopf (2017). Elements of Causal Inference: Foundations and Learning Algorithms. Cambridge, Massachusetts: The MIT Press.

[26] Resnick, S. I. (2007). Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Berlin, Germany: Springer Science & Business Media.

[27] Shah, R. D. and J. Peters (2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics 48(3), 1514–1538.

[28] Shi, H., M. Drton, and F. Han (2024). On Azadkia–Chatterjee’s conditional dependence coefficient. Bernoulli 30(2), 851–877.

[29] Shimizu, S., P. O. Hoyer, A. Hyv¨arinen, and A. J. Kerminen (2006). A linear non-Gaussian acyclic model for causal discovery. The Journal of Machine Learning Research 7(72), 2003–2030.

[30] Shimizu, S., T. Inazumi, Y. Sogawa, A. Hyv¨arinen, Y. Kawahara, T. Washio, P. O. Hoyer, and K. Bollen

[31] (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. The Journal of Machine Learning Research 12(33), 1225–1248.

[32] Shojaie, A. and G. Michailidis (2010). Penalized likelihood methods for estimation of sparse highdimensional directed acyclic graphs. Biometrika 97(3), 519–538.

[33] Spirtes, P., C. N. Glymour, and R. Scheines (2000).

[34] Causation, Prediction, and Search. Cambridge, Massachusetts: MIT Press.

[35] Sun, Q., W. Zhou, and J. Fan (2020). Adaptive huber regression. Journal of the American Statistical Association 115(529), 254–265.

[36] Sun, W., J. Wang, and Y. Fang (2013). Consistent selection of tuning parameters via variable selection stability. The Journal of Machine Learning Research 14(107), 3419–3440.

[37] Wang, X., W. Pan, W. Hu, Y. Tian, and H. Zhang (2015). Conditional distance correlation. Journal of the American Statistical Association 110(512), 1726–1734.

[38] Wang, Y. S. and M. Drton (2020). High-dimensional causal discovery under non-Gaussianity. Biometrika 107(1), 41–59.

[39] Yang, J. and Y. Zhou (2013). Credit risk spillovers among financial institutions around the global credit crisis: Firm-level evidence. Management Science 59(10), 2343–2359.

[40] Yang, Z. and Y. Zhou (2017). Quantitative easing and volatility spillovers across countries and asset classes. Management Science 63(2), 333–354.

[41] Zhang, K., J. Peters, D. Janzing, and B. Sch¨olkopf (2011). Kernel-based conditional independence test and application in causal discovery. In 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 804–813. AUAI Press.

[42] Zhao, R., X. He, and J. Wang (2022). Learning linear non-Gaussian directed acyclic graph with diverging number of nodes. The Journal of Machine Learning Research 23(269), 1–34.

[43] Zhao, T. and H. Liu (2014). Calibrated precision matrix estimation for high-dimensional elliptical distributions. IEEE transactions on Information Theory 60(12), 7874–7887.

[44] Zhou, W., X. He, W. Zhong, and J. Wang (2022). Efficient learning of quadratic variance function directed acyclic graphs via topological layers. Journal of Computational and Graphical Statistics 31(4), 1269– 1279. Joint Laboratory of Data Science and Business Intelligence, School of Statistics and Data Science, Southwestern University of Finance and Economics, China