Statistica Sinica 25 (2015), 1459-1476
Abstract: Common pre-processing procedures for time course microarray analysis such as standardization and gene filtering based on the functional F-test, often result in directional data that lie on a sphere Sd-1. While there have been some efforts in designing spherical clustering algorithms, few researchers have developed methods for selecting the number of clusters for spherical cluster analysis. In this paper, we focus on circular data on S1 and propose a novel information-based criterion ICCC (information criterion for circular clustering) to determine the number of clusters when clustering circular data. This new criterion, ICCC, is based on a finite mixture model of Langevin distributions and is derived from the asymptotic properties of the maximum likelihood of the Langevin mixture distribution. Through the study of both simulated data and a large set of time course microarray data, we demonstrate that the ICCC criterion provides better estimates of the number of clusters than such existing methods: AIC, BIC, the Gap criterion, and the Maitra-Ramler criterion.
Key words and phrases: Circular statistics, clustering, information criterion, Langevin distribution, mixture model, model selection.