Back To Index Previous Article Next Article Full Text

Statistica Sinica 33 (2023), 893-918

A CLUSTERED GAUSSIAN PROCESS MODEL
FOR COMPUTER EXPERIMENTS

Chih-Li Sung, Benjamin Haaland, Youngdeok Hwang and Siyuan Lu

Michigan State University, University of Utah,
City University of New York and IBM Thomas J. Watson Research Center

Abstract: The Gaussian process is one of the most important approaches for emulating computer simulations. However, the stationarity assumption common to Gaussian process emulation and the computational intractability for large-scale data sets limit accuracy and feasibility in practice. In this article, we propose a clustered Gaussian process model that simultaneously segments the input data into multiple clusters and fits a Gaussian process model in each cluster. The model parameters and the clusters are learned through the efficient stochastic expectation-maximization, which allows for emulations for large-scale computer simulations. Importantly, the proposed method provides valuable model interpretability by identifying clusters, which reveal hidden patterns in the input-output relationship. The number of clusters, which controls the bias-variance trade-off, is efficiently selected using cross-validation to ensure accurate predictions. In our simulations and a real application to solar irradiance emulation, our proposed method has smaller mean squared errors than its main competitors, with competitive computation time, and provides valuable insights from the data by discovering clusters. An R package for the proposed methodology is provided in an open repository.

Key words and phrases: Large-scale data, mixture models, nonstationarity, solar irradiance emulation, uncertainty quantification.

Back To Index Previous Article Next Article Full Text