David A. van Dyk (2010). Marginal Markov chain Monte Carlo methods. Vol. 20, No. 4, 1423-1454.

Statistica Sinica 20 (2010), 1423-1454

MARGINAL MARKOV CHAIN MONTE CARLO METHODS

David A. van Dyk

University of California, Irvine

Abstract: Marginal Data Augmentation and Parameter-Expanded Data Augmentation are related methods for improving the convergence properties of the two-step Gibbs sampler known as the Data Augmentation sampler. These methods expand the parameter space with a so-called working parameter that is unidentifiable given the observed data but is identifiable given the so-called augmented data. Although these methods can result in enormous computational gains, their use has been somewhat limited due to the constrained framework they are constructed under and the necessary identification of a working parameter. This article proposes a new prescriptive framework that greatly expands the class of problems that can benefit from the key idea underlying these methods. In particular, we show how working parameters can automatically be introduced into any Gibbs sampler, and explore how they should be updated vis-à-vis the updating of the model parameters in order to either fully or partially marginalize them from the target distribution. A prior distribution is specified on the working parameters and the convergence properties of the Markov chain depend on this choice. Under certain conditions the optimal choice is improper and results in a non-positive recurrent joint Markov chain on the expanded parameter space. This leads to unexplored technical difficulties when one attempts to exploit the computational advantage in multi-step MCMC samplers, the very chains that might benefit most from this technology. In this article we develop strategies and theory that allow optimal marginal methods to be used in multi-step samplers. We illustrate the potential to dramatically improve the convergence properties of MCMC samplers by applying the marginal Gibbs sampler to a logistic mixed model.

Key words and phrases: Conditional data augmentation, Gibbs sampler, logistic mixed model, marginal data augmentation, MCMC, mixing rate, non-positive recurrent Markov chain, partial marginalization, stationary distribution, working parameters.