Abstract

Shapley effects are a particularly interpretable approach to assessing

how a function depends on its various inputs. The existing literature contains

various estimators for this class of sensitivity indices in the context of nonparametric regression where the function is observed with noise, but there does not

seem to be an estimator that is computationally tractable for input dimensions

in the hundreds scale. This article provides such an estimator that is computationally tractable on this scale. The estimator uses a metamodel-based approach

by first fitting a Bayesian Additive Regression Trees model which is then used

to compute Shapley-effect estimates. This article also establishes a theoretical

guarantee of posterior consistency on a large function class for this Shapley-effect

estimator. Finally, this paper explores the performance of these Shapley-effect

estimators on four different test functions for various input dimensions, including

p = 500.

Information

Preprint No.SS-2024-0318
Manuscript IDSS-2024-0318
Complete AuthorsAkira Horiguchi, Matthew T. Pratola
Corresponding AuthorsAkira Horiguchi
Emailsahoriguchi@ucdavis.edu

References

  1. Awaya, N. and L. Ma (2024a). Hidden Parkov P´olya trees for high-dimensional distributions.
  2. Journal of the American Statistical Association 119(545), 189–201.
  3. Awaya, N. and L. Ma (2024b). Unsupervised tree boosting for learning probability distributions.
  4. Journal of Machine Learning Research 25(198), 1–52.
  5. Benoumechiara, N. and K. Elie-Dit-Cosaque (2019). Shapley effects for sensitivity analysis with dependent inputs: bootstrap and kriging-based algorithms. ESAIM: Proceedings and Surveys 65, 266–293.
  6. Broto, B., F. Bachoc, and M. Depecker (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification 8(2), 693–716.
  7. Castro, J., D. G´omez, and J. Tejada (2009). Polynomial calculation of the Shapley value based on sampling. Computers & operations research 36(5), 1726–1730.
  8. Chen, W., R. Jin, and A. Sudjianto (2005). Analytical variance-based global sensitivity analysis in simulation-based design under uncertainty. Journal of mechanical design 127(5), 875– 886.
  9. Chen, W., R. Jin, and A. Sudjianto (2006). Analytical global sensitivity analysis and uncertainty propagation for robust design. Journal of quality technology 38(4), 333–348.
  10. Chipman, H., P. Ranjan, and W. Wang (2012). Sequential design for computer experiments with a flexible Bayesian additive model. Canadian Journal of Statistics 40(4), 663–678.
  11. Chipman, H. A., E. I. George, and R. E. McCulloch (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1), 266–298. Climate Interactive, Ventana Systems, Todd Fincannon, UML Climate Change Initiative, and
  12. MIT Sloan (2020). En ROADS climate change solutions simulator. https://en-roads. climateinteractive.org/scenario.html?v=2.7.15. Accessed: 2020-04-03.
  13. Crestaux, T., O. L. Maˆıtre, and J.-M. Martinez (2009). Polynomial chaos expansion for sensitivity analysis. Reliability Engineering & System Safety 94(7), 1161 – 1172. Special Issue on Sensitivity Analysis.
  14. Denison, D. G., B. K. Mallick, and A. F. Smith (1998). Bayesian MARS. Statistics and Computing 8, 337–346.
  15. Donsker, M. D. (1952). Justification and extension of Doob’s heuristic approach to the Kolmogorov-Smirnov theorems. The Annals of Mathematical Statistics 23(2), 277–281.
  16. Francom, D. and B. Sans´o (2020). BASS: An R package for fitting and performing sensitivity analysis of Bayesian adaptive spline surfaces. Journal of Statistical Software 94(8), 1–36.
  17. Francom, D., B. Sans´o, A. Kupresanin, and G. Johannesson (2018). Sensitivity analysis and emulation for functional data using Bayesian adaptive splines. Statistica Sinica 28(2), 791–816.
  18. Goda, T. (2021). A simple algorithm for global sensitivity analysis with Shapley effects. Reliability Engineering & System Safety 213, 107702.
  19. Gramacy, R. B. and B. Haaland (2016). Speeding up neighborhood search in local Gaussian process prediction. Technometrics 58(3), 294–303.
  20. Gramacy, R. B. and M. Taddy (2010). Categorical inputs, sensitivity analysis, optimization and importance tempering with tgp version 2, an R package for treed Gaussian process models. Journal of Statistical Software 33(6), 1–48.
  21. Gramacy, R. B., M. Taddy, S. M. Wild, et al. (2013). Variable selection and sensitivity analysis using dynamic trees, with an application to computer code performance tuning. The Annals of Applied Statistics 7(1), 51–80.
  22. Horiguchi, A. (2020). Bayesian Additive Regression Trees: Sensitivity Analysis and Multiobjective Optimization. Ph. D. thesis, The Ohio State University. http://rave.ohiolink.edu/ etdc/view?acc_num=osu1606841319315633.
  23. Horiguchi, A., M. T. Pratola, and T. J. Santner (2021). Assessing variable activity for Bayesian regression trees. Reliability Engineering & System Safety 207, 107391.
  24. Horiguchi, A., T. J. Santner, Y. Sun, and M. T. Pratola (2022). Using BART to perform pareto optimization and quantify its uncertainties. Technometrics 64(4), 1–11.
  25. Iooss, B., S. D. Veiga, A. Janon, and G. Pujol (2023). sensitivity: Global Sensitivity Analysis of Model Outputs. R package version 1.28.1.
  26. Jeong, S. and V. Rockova (2023). The art of BART: Minimax optimality over nonhomogeneous smoothness in high dimension. Journal of Machine Learning Research 24(337), 1–65.
  27. Li, S., B. Yang, and F. Qi (2016). Accelerate global sensitivity analysis using artificial neural network algorithm: Case studies for combustion kinetic model. Combustion and Flame 168, 53–64.
  28. Linero, A. R. (2018). Bayesian regression trees for high-dimensional prediction and variable selection. Journal of the American Statistical Association 113(522), 626–636.
  29. Liu, Y., V. Roˇckov´a, and Y. Wang (2021, 04). Variable selection with ABC Bayesian forests.
  30. Journal of the Royal Statistical Society Series B: Statistical Methodology 83(3), 453–481.
  31. Marrel, A., B. Iooss, B. Laurent, and O. Roustant (2009). Calculations of Sobol´ indices for the Gaussian process metamodel. Reliability Engineering & System Safety 94(3), 742–751.
  32. Moon, H. (2010). Design and analysis of computer experiments for screening input variables. Ph. D. thesis, The Ohio State University. http://rave.ohiolink.edu/etdc/view?acc_ num=osu1275422248.
  33. Oakley, J. E. and A. O’Hagan (2004). Probabilistic sensitivity analysis of complex models: a Bayesian approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(3), 751–769.
  34. Owen, A. B. (2014). Sobol´ indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification 2(1), 245–251.
  35. Plischke, E., G. Rabitti, and E. Borgonovo (2021). Computing Shapley effects for sensitivity analysis. SIAM/ASA Journal on Uncertainty Quantification 9(4), 1411–1437.
  36. Pratola, M. T. (2023, 4). Open Bayesian trees. https://bitbucket.org/mpratola/openbt/ src/master/. Accessed: 2023-04-03.
  37. Pratola, M. T., H. A. Chipman, J. R. Gattiker, D. M. Higdon, R. McCulloch, and W. N.
  38. Rust (2014). Parallel Bayesian additive regression trees. Journal of Computational and Graphical Statistics 23(3), 830–852.
  39. Radaideh, M. I. and T. Kozlowski (2020). Surrogate modeling of advanced computer simulations using deep Gaussian processes. Reliability Engineering & System Safety 195, 106731.
  40. Santner, T. J., B. J. Williams, and W. I. Notz (2018). The Design and Analysis of Computer
  41. Experiments, Second Edition. Springer-Verlag.
  42. Shapley, L. S. (1952). A value for n-person games. Technical report, The RAND Corporation. https://www.rand.org/content/dam/rand/pubs/papers/2021/P295.pdf.
  43. Sobol´, I. M. (1990). On sensitivity estimation for nonlinear mathematical models. Matematicheskoe modelirovanie 2(1), 112–118.
  44. Sobol´, I. M. (1993). Sensitivity estimates for nonlinear mathematical models. MMCE 1(4), 407–414.
  45. Song, E., B. L. Nelson, and J. Staum (2016). Shapley effects for global sensitivity analysis: Theory and computation. SIAM/ASA Journal on Uncertainty Quantification 4(1), 1060– 1083.
  46. Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering & System Safety 93(7), 964–979.
  47. Svenson, J., T. Santner, A. Dean, and H. Moon (2014). Estimating sensitivity indices based on Gaussian process metamodels with compactly supported correlation functions. Journal of Statistical Planning and Inference 144, 160–172.
  48. Tang, Y. (2024). A note on Monte Carlo integration in high dimensions. The American Statistician 78(3), 290–296.
  49. van Campen, T., H. Hamers, B. Husslage, and R. Lindelauf (2018). A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack. Social Network Analysis and Mining 8, 1–12.
  50. van der Pas, S. and V. Roˇckov´a (2017). Bayesian dyadic trees and histograms for regression. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, pp. 2089–2099. Curran Associates, Inc.
  51. Wu, Z., D. Wang, P. Okolo, F. Hu, and W. Zhang (2016). Global sensitivity analysis using a Gaussian radial basis function metamodel. Reliability Engineering & System Safety 154, 171–179.
  52. Yang, L., Y. Zhou, H. Fu, M.-Q. Liu, and W. Zheng (2024). Fast approximation of the Shapley values based on order-of-addition experimental designs. Journal of the American Statistical Association 119(547), 2294–2304.
  53. Zhang, X. and N. Dimitrov (2024). Variable importance analysis of wind turbine extreme responses with Shapley value explanation. Renewable Energy 232, 121049.

Acknowledgments

Akira Horiguchi would like to acknowledge Miheer Dewaskar for insightful

discussions. The work of Matthew T. Pratola was supported in part by the

National Science Foundation under Agreements DMS-1916231 and OAC-

2004601, and in part by the Office of Sponsored Research (OSR) at the King

Abdullah University of Science and Technology (KAUST) under Award No.

Supplementary Materials

The online Supplementary Material contains a summary table of metamodel

properties, a review of posterior contraction theory, and the preliminaries

required to establish the posterior asymptotic results. It further presents

the statements and proofs of these asymptotic results, along with detailed

proofs of results from the main text. Definitions of the functions used in

the experiments described in Section 5 of the main paper are provided,

together with additional experiments examining how the metamodels scale

with input dimensionality.


Supplementary materials are available for download.