Back To Index Previous Article Next Article Full Text

Statistica Sinica 33 (2023), 1343-1364

AN EFFICIENT GREEDY SEARCH ALGORITHM
FOR HIGH-DIMENSIONAL LINEAR
DISCRIMINANT ANALYSIS

Hannan Yang, Danyu Lin and Quefeng Li

University of North Carolina at Chapel Hill
>

Abstract: High-dimensional classification is an important statistical problem with applications in many areas. One widely used classifier is the linear discriminant analysis (LDA). In recent years, many regularized LDA classifiers have been proposed to solve the problem of high-dimensional classification. However, these methods rely on inverting a large matrix or solving large-scale optimization problems in order to render classification rules, making them computationally prohibitive when the dimension is ultrahigh. With the emergence of big data, it has become increasingly important that we develop more efficient algorithms to solve high-dimensional LDA problems. In this paper, we propose an efficient greedy search algorithm that depends solely on closed-form formulae to learn a high-dimensional LDA rule. We establish a theoretical guarantee of its statistical properties in terms of variable selection and error rate consistency. In addition, we provide an explicit interpretation of the extra information brought by an additional feature in an LDA problem under some mild distributional assumptions. We demonstrate that the computational speed of the new algorithm is significantly better than that of other high-dimensional LDA methods, while maintaining comparable or even better classification performance.

Key words and phrases: Greedy search, high-dimensional classification, linear discriminant analysis, Mahalanobis distance, variable selection.

Back To Index Previous Article Next Article Full Text