Statistica Sinica

Hock-Peng Chan and Wei-Liem Loh

Abstract:This article is concerned with the file linkage problem first investigated by DeGroot and Goel (1980). Let be a random sample from a bivariate normal distribution. Suppose that before the sample can be observed, it is broken into the components and where and is some unknown permutation of . The aim is to estimate the parameters (in particular the correlation coefficient) of the bivariate normal distribution using the above broken random sample. The main difficulty here is that direct computation of the likelihood is in general a NP-hard problem. Thus for sufficiently large, standard likelihood or Bayesian techniques may not be feasible. This article proposes to reformulate the problem as a moment problem via Fisher's -statistics. The resulting likelihood can be approximated as a product of bivariate normal likelihoods and consequently standard statistical methods can be applied. It is also shown that this approximation is very good in that very little Fisher information is lost.

Key words and phrases:Bivariate normal distribution, broken random sample, correlation coefficient, file linkage, Fisher information, -statistics.