Pruning neural networks without any data by iteratively conserving
synaptic flow
- URL: http://arxiv.org/abs/2006.05467v3
- Date: Thu, 19 Nov 2020 03:54:34 GMT
- Title: Pruning neural networks without any data by iteratively conserving
synaptic flow
- Authors: Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli
- Abstract summary: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy.
Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainableworks.
We provide an affirmative answer to this question through theory driven algorithm design.
- Score: 27.849332212178847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pruning the parameters of deep neural networks has generated intense interest
due to potential savings in time, memory and energy both during training and at
test time. Recent works have identified, through an expensive sequence of
training and pruning cycles, the existence of winning lottery tickets or sparse
trainable subnetworks at initialization. This raises a foundational question:
can we identify highly sparse trainable subnetworks at initialization, without
ever training, or indeed without ever looking at the data? We provide an
affirmative answer to this question through theory driven algorithm design. We
first mathematically formulate and experimentally verify a conservation law
that explains why existing gradient-based pruning algorithms at initialization
suffer from layer-collapse, the premature pruning of an entire layer rendering
a network untrainable. This theory also elucidates how layer-collapse can be
entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow
Pruning (SynFlow). This algorithm can be interpreted as preserving the total
flow of synaptic strengths through the network at initialization subject to a
sparsity constraint. Notably, this algorithm makes no reference to the training
data and consistently competes with or outperforms existing state-of-the-art
pruning algorithms at initialization over a range of models (VGG and ResNet),
datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to
99.99 percent). Thus our data-agnostic pruning algorithm challenges the
existing paradigm that, at initialization, data must be used to quantify which
synapses are important.
Related papers
- Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning [14.792099973449794]
We propose an algorithm to align the training dynamics of the sparse network with that of the dense one.
We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account.
Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
arXiv Detail & Related papers (2024-06-03T22:19:42Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - On the optimization and generalization of overparameterized implicit
neural networks [25.237054775800164]
Implicit neural networks have become increasingly attractive in the machine learning community.
We show that global convergence is guaranteed, even if only the implicit layer is trained.
This paper investigates the generalization error for implicit neural networks.
arXiv Detail & Related papers (2022-09-30T16:19:46Z) - What to Prune and What Not to Prune at Initialization [0.0]
Post-training dropout based approaches achieve high sparsity.
Initialization pruning is more efficacious when it comes to scaling computation cost of the network.
The goal is to achieve higher sparsity while preserving performance.
arXiv Detail & Related papers (2022-09-06T03:48:10Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.