Sparsity in Deep Learning: Pruning and growth for efficient inference
and training in neural networks
- URL: http://arxiv.org/abs/2102.00554v1
- Date: Sun, 31 Jan 2021 22:48:50 GMT
- Title: Sparsity in Deep Learning: Pruning and growth for efficient inference
and training in neural networks
- Authors: Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, Alexandra
Peste
- Abstract summary: Sparsity can reduce the memory footprint of regular networks to fit mobile devices.
We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
- Score: 78.47459801017959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing energy and performance costs of deep learning have driven the
community to reduce the size of neural networks by selectively pruning
components. Similarly to their biological counterparts, sparse networks
generalize just as well, if not better than, the original dense networks.
Sparsity can reduce the memory footprint of regular networks to fit mobile
devices, as well as shorten training time for ever growing networks. In this
paper, we survey prior work on sparsity in deep learning and provide an
extensive tutorial of sparsification for both inference and training. We
describe approaches to remove and add elements of neural networks, different
training strategies to achieve model sparsity, and mechanisms to exploit
sparsity in practice. Our work distills ideas from more than 300 research
papers and provides guidance to practitioners who wish to utilize sparsity
today, as well as to researchers whose goal is to push the frontier forward. We
include the necessary background on mathematical methods in sparsification,
describe phenomena such as early structure adaptation, the intricate relations
between sparsity and the training process, and show techniques for achieving
acceleration on real hardware. We also define a metric of pruned parameter
efficiency that could serve as a baseline for comparison of different sparse
networks. We close by speculating on how sparsity can improve future workloads
and outline major open problems in the field.
Related papers
- Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Deep Fusion: Efficient Network Training via Pre-trained Initializations [3.9146761527401424]
We present Deep Fusion, an efficient approach to network training that leverages pre-trained initializations of smaller networks.
Our experiments show how Deep Fusion is a practical and effective approach that not only accelerates the training process but also reduces computational requirements.
We validate our theoretical framework, which guides the optimal use of Deep Fusion, showing that it significantly reduces both training time and resource consumption.
arXiv Detail & Related papers (2023-06-20T21:30:54Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Training Larger Networks for Deep Reinforcement Learning [18.193180866998333]
We show that naively increasing network capacity does not improve performance.
We propose a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, and 3) a distributed training method to mitigate overfitting problems.
Using this three-fold technique, we show that we can train very large networks that result in significant performance gains.
arXiv Detail & Related papers (2021-02-16T02:16:54Z) - HALO: Learning to Prune Neural Networks with Shrinkage [5.283963846188862]
Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data.
Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network.
We present a novel penalty called Hierarchical Adaptive Lasso which learns to adaptively sparsify weights of a given network via trainable parameters.
arXiv Detail & Related papers (2020-08-24T04:08:48Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Layer Sparsity in Neural Networks [7.436953928903182]
We discuss sparsity in the framework of neural networks.
In particular, we formulate a new notion of sparsity that concerns the networks' layers.
We introduce corresponding regularization and refitting schemes to generate more compact and accurate networks.
arXiv Detail & Related papers (2020-06-28T13:41:59Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - Differentiable Sparsification for Deep Neural Networks [0.0]
We propose a fully differentiable sparsification method for deep neural networks.
The proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner.
To the best of our knowledge, this is the first fully differentiable sparsification method.
arXiv Detail & Related papers (2019-10-08T03:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.