Statistica Sinica

J. T. Gene Hwang and Ming-Chung Yang

Abstract:The contingency table arises in nearly every application of statistics. However, even the basic problem of testing independence is not totally resolved. More than thirty-five years ago, Lancaster (1961) proposed using the mid -value for testing independence in a contingency table. The mid -value is defined as half the conditional probability of the observed statistic plus the conditional probability of more extreme values, given the marginal totals. Recently there seems to be recognition that the mid -value is quite an attractive procedure. It tends to be less conservative than the -value derived from Fisher's exact test. However, the procedure is considered to be somewhat ad-hoc.

In this paper we provide theory to justify mid -values. We apply the Neyman-Pearson fundamental lemma and theestimated truth approach, to derive optimal procedures, namedexpected -values. The estimated truth approach views -values as estimators of the truth function which is one or zero depending on whether the null hypothesis holds or not. A decision theory approach is taken to compare the -values using risk functions. In the one-sided case, the expected -value is exactly the mid -value. For the two-sided case, the expected -value is a new procedure that can be constructed numerically. In a contingency table of two independent binomial samplings with balanced sample sizes, the expected -value reduces to a two-sided mid -value. Further, numerical evidence shows that the expected -values lead to tests which have type one error very close to the nomial level. Our theory provides strong support for mid -values.

Key words and phrases:Estimated truth approach, Fisher's exact test, expected -value.