Back To Index Previous Article Next Article Full Text

Statistica Sinica 36 (2026), 715-740

INFORMATION-BASED OPTIMAL SUBDATA SELECTION
FOR CLUSTERWISE LINEAR REGRESSION

Yanxi Liu, John Stufken and Min Yang*

AbbVie Inc., George Mason University and University of Illinois at Chicago

Abstract: Mixture-of-Experts (MoE) models are commonly used when there exist distinct clusters with different relationships between the independent and dependent variables. Fitting such models for large datasets, however, is computationally virtually impossible. An attractive alternative is to use a subdata selected by “maximizing” the Fisher information matrix. A major challenge is that no closed-form expression for the Fisher information matrix is available for such models. Focusing on clusterwise linear regression models, a subclass of MoE models, we develop a framework that overcomes this challenge. We prove that the proposed subdata selection approach is asymptotically optimal, i.e., no other method is statistically more efficient than the proposed one when the full data size is large.

Key words and phrases: D-optimality, information matrix, latent indicator, massive data, MLE.


Back To Index Previous Article Next Article Full Text