Back To Index Previous Article Next Article Full Text

Statistica Sinica 31 (2021), 1397-1414

SPARSE DEEP NEURAL NETWORKS USING
L1,∞-WEIGHT NORMALIZATION

Ming Wen1 , Yixi Xu2 , Yunling Zheng1 , Zhouwang Yang1 and Xiao Wang2

1University of Science and Technology of China and 2Purdue University

Abstract: Deep neural networks (DNNs) have recently demonstrated an excellent performance on many challenging tasks. However, overfitting remains a significant challenge in DNNs. Empirical evidence suggests that inducing sparsity can relieve overfitting, and that weight normalization can accelerate the algorithm convergence. In this study, we employ L1,∞ weight normalization for DNNs with bias neurons to achieve a sparse architecture. We theoretically establish the generalization error bounds for both regression and classification under the L1,∞ weight normalization. Furthermore, we show that the upper bounds are independent of the network width and the vK-dependence on the network depth k, which are the best available bounds for networks with bias neurons. These results provide theoretical justifications for using such weight normalization to reduce the generalization error. We also develop an easily implemented gradient projection descent algorithm to practically obtain a sparse neural network. Finally, we present various experiments that validate our theory and demonstrate the effectiveness of the resulting approach.

Key words and phrases: Deep neural networks, generalization, overfitting, rademarcher complexity, sparsity.

Back To Index Previous Article Next Article Full Text