Mengyu Li and Junlong Zhao (2022). COMMUNICATION-EFFICIENT DISTRIBUTED LINEAR DISCRIMINANT ANALYSIS FOR BINARY CLASSIFICATION. Vol 32 No. 3, 1343-1361.

Abstract: Large-scale data are common when the sample size n is large, and these data are often stored on k different local machines. Distributed statistical learning is an efficient way to deal with such data. In this study, we consider the binary classification problem for massive data based on a linear discriminant analysis (LDA) in a distributed learning framework. The classical centralized LDA requires the transmission of some p-by-p summary matrices to the hub, where p is the dimension of the variates under consideration. This can be a burden when p is large or the communication costs between the nodes are expensive. We consider two distributed LDA estimators, two-round and one-shot estimators, which are communication-efficient without transmitting p-by-p matrices. We study the asymptotic relative efficiency of distributed LDA estimators compared to a centralized LDA using random matrix theory under different settings of k. It is shown that when k is in a suitable range, such as k = o(n/p), these two distributed estimators achieve the same efficiency as that of the centralized estimator under mild conditions. Moreover, the two-round estimator can relax the restriction on k, allowing kp/n → c ∈ [0, 1) under some conditions. Simulations confirm the theoretical results.

Key words and phrases: Deterministic equivalent, distributed learning, linear discriminant analysis (LDA), random matrix, relative efficiency.