Robust Pruning at Initialization
- URL: http://arxiv.org/abs/2002.08797v5
- Date: Wed, 19 May 2021 22:43:36 GMT
- Title: Robust Pruning at Initialization
- Authors: Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh
- Abstract summary: A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
- Score: 61.30574156442608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparameterized Neural Networks (NN) display state-of-the-art performance.
However, there is a growing need for smaller, energy-efficient, neural networks
tobe able to use machine learning applications on devices with limited
computational resources. A popular approach consists of using pruning
techniques. While these techniques have traditionally focused on pruning
pre-trained NN (LeCun et al.,1990; Hassibi et al., 1993), recent work by Lee et
al. (2018) has shown promising results when pruning at initialization. However,
for Deep NNs, such procedures remain unsatisfactory as the resulting pruned
networks can be difficult to train and, for instance, they do not prevent one
layer from being fully pruned. In this paper, we provide a comprehensive
theoretical analysis of Magnitude and Gradient based pruning at initialization
and training of sparse architectures. This allows us to propose novel
principled approaches which we validate experimentally on a variety of NN
architectures.
Related papers
- Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition [11.399520888150468]
We present a theoretically-justified technique termed Low-Rank Induced Training (LoRITa)
LoRITa promotes low-rankness through the composition of linear layers and compresses by using singular value truncation.
We demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and ImageNet on Convolutional Neural Networks.
arXiv Detail & Related papers (2024-05-06T00:58:23Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Exact Gradient Computation for Spiking Neural Networks Through Forward
Propagation [39.33537954568678]
Spiking neural networks (SNN) have emerged as alternatives to traditional neural networks.
We propose a novel training algorithm, called emphforward propagation (FP), that computes exact gradients for SNN.
arXiv Detail & Related papers (2022-10-18T20:28:21Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - An Operator Theoretic Perspective on Pruning Deep Neural Networks [2.624902795082451]
We make use of recent advances in dynamical systems theory to define a new class of theoretically motivated pruning algorithms.
We show that these algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods.
arXiv Detail & Related papers (2021-10-28T02:33:50Z) - Neural Network Pruning Through Constrained Reinforcement Learning [3.2880869992413246]
We propose a general methodology for pruning neural networks.
Our proposed methodology can prune neural networks to respect pre-defined computational budgets.
We prove the effectiveness of our approach via comparison with state-of-the-art methods on standard image classification datasets.
arXiv Detail & Related papers (2021-10-16T11:57:38Z) - A Framework for Neural Network Pruning Using Gibbs Distributions [34.0576955010317]
Gibbs pruning is a novel framework for expressing and designing neural network pruning methods.
It can train and prune a network simultaneously in such a way that the learned weights and pruning mask are well-adapted for each other.
We achieve a new state-of-the-art result for pruning ResNet-56 with the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-06-08T23:04:53Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.