Statistica Sinica

Craig B. Borkowf

Abstract:Consider first the set of all possible -way multinomial tables defined by certain mean cell proportions with a given number of total counts. Consider next the subset of these -way tables that, in addition, satisfies certain one-way marginal totals obtained by summing the cell counts over all but one subscript. The subset of tables that satisfies these marginal constraints is said to have the multivariate extended hypergeometric (MXH) distribution. In this paper we develop a general algorithm for calculating the asymptotic variance of -way MXH tables and present some explicit covariance formulas under independence and in other special cases. We also note that permutation tests defined by certain mean cell proportions and one-way marginal constraints essentially enumerate the entire set of MXH tables with those proportions and constraints. Thus, one can use the asymptotic MXH distribution to approximate the finite sample variances of statistics calculated under permutation tests for various null and alternative hypotheses. One can then use these results to construct confidence intervals for parameters of interest and to approximate the percentiles of test statistics under permutation tests, which is a significant advantage when these tests are computationally prohibitive. We illustrate the use of methods based on the asymptotic MXH distribution as complements and alternatives to permutation tests in the analysis of epidemiological studies of gene-environment interactions.

Key words and phrases:Contingency table, epidemiology, gene-environment interaction, multinomial, multivariate extended hypergeometric (MXH) distribution, permutation test.