Statistica Sinica 24 (2014), 1547-1570
Abstract: Multi-label classification is increasingly common in modern applications such as medical diagnosis and document categorization. One important issue in multi-label classification is the existence of statistical difference of classifier scores among different classes. When not accounted for properly, such differences can lead to poor classification decisions on some classes. We address this issue by developing a strategy based on a new concept, Local Precision Rate (LPR), under the assumption that classifiers learned for each class are given and corresponding classifier scores for a set of training objects and a set of objects to be classified are available. Under certain conditions, we show that transforming the classifier scores into LPRs and making classification decisions by comparing LPR values for all objects against all classes can theoretically guarantee the maximum of precision at any recall rate. We also show that LPR is mathematically equivalent to 1-ℓFDR, where ℓFDR stands for local false discovery rate. This equivalence and the Bayesian interpretation of ℓFDR provide an alternative justification for the theoretical optimal property of LPR. We propose a new estimation method for 1-ℓFDR (or LPR) based on the formulation of LPR, since the original formulation of 1-ℓFDR has limitations for estimation when data are noisy. Numerical studies are conducted based on both simulation and real data to demonstrate the superior performance of LPR over existing methods.
Key words and phrases: False discovery rate, local false discovery rate, local precision rate, multilabel classification, optimization, smoothing.