Statistica Sinica

Richard M. Dudley and Dominique Haughton

Abstract:In this paper, we extend information criteria for model selection to the case ofKindependent data sets corresponding to different true parameters θ_{1},...,θ_{k}and to situations where some of the models may have the same dimension and may include boundaries. New criteria are introduced: SBICR, which combines criteria from different data sets, and IBICR, which treats one data set at a time. We apply the criteria to a set of 2×2 contingency tables (mosquito data) and to some data on baseball players' performance. Consistency results are given for the criteria under some assumptions. The best model will be the smallest one containing all the θ_{i}. A modelm_{j}is called competitive if the vector θ^{(k)}of true parameters is in the closure of the set of ψ^{(k)}'s wherem_{j}is the best model. We find that, under reasonable assumptions, for submodels of an exponential family, if for all competitivem_{j}, is not too thin close to θ^{(k)}, the SBICR procedure is asymptotically close to Bayes procedures. This article extends results in Haughton (Ann. Statist. 1988, 1989) and Poskitt (J. Roy. Statist. Soc. Ser. B 1987).

Key words and phrases:BIC, Jeffreys' prior, model selection.