Dynamic Sparsity Is Channel-Level Sparsity Learner
- URL: http://arxiv.org/abs/2305.19454v2
- Date: Fri, 10 Nov 2023 16:42:39 GMT
- Title: Dynamic Sparsity Is Channel-Level Sparsity Learner
- Authors: Lu Yin, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang Wang,
Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, Shiwei Liu
- Abstract summary: Dynamic sparse training (DST) is a leading sparse training approach.
Channel-aware dynamic sparse (Chase) seamlessly translates the promise of unstructured dynamic sparsity to channel-level sparsity.
Our approach translates unstructured sparsity to channel-wise sparsity.
- Score: 91.31071026340746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse training has received an upsurging interest in machine learning due to
its tantalizing saving potential for the entire training process as well as
inference. Dynamic sparse training (DST), as a leading sparse training
approach, can train deep neural networks at high sparsity from scratch to match
the performance of their dense counterparts. However, most if not all DST prior
arts demonstrate their effectiveness on unstructured sparsity with highly
irregular sparse patterns, which receives limited support in common hardware.
This limitation hinders the usage of DST in practice. In this paper, we propose
Channel-aware dynamic sparse (Chase), which for the first time seamlessly
translates the promise of unstructured dynamic sparsity to GPU-friendly
channel-level sparsity (not fine-grained N:M or group sparsity) during one
end-to-end training process, without any ad-hoc operations. The resulting small
sparse networks can be directly accelerated by commodity hardware, without
using any particularly sparsity-aware hardware accelerators. This appealing
outcome is partially motivated by a hidden phenomenon of dynamic sparsity:
off-the-shelf unstructured DST implicitly involves biased parameter
reallocation across channels, with a large fraction of channels (up to 60%)
being sparser than others. By progressively identifying and removing these
channels during training, our approach translates unstructured sparsity to
channel-wise sparsity. Our experimental results demonstrate that Chase achieves
1.7 X inference throughput speedup on common GPU devices without compromising
accuracy with ResNet-50 on ImageNet. We release our codes in
https://github.com/luuyin/chase.
Related papers
- Dynamic Sparse Training with Structured Sparsity [11.778353786208765]
Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training.
We propose a sparse-to-sparse DST method, Structured RigL (SRigL), to learn a variant of fine-grained structured N:M sparsity.
We demonstrate a real-world acceleration of 3.4x/2.5x on CPU for online inference and 1.7x/13.0x on GPU for inference with a batch size of 256.
arXiv Detail & Related papers (2023-05-03T17:48:55Z) - SparseProp: Efficient Sparse Backpropagation for Faster Training of
Neural Networks [20.18957052535565]
We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse.
Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types.
We show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch.
arXiv Detail & Related papers (2023-02-09T18:54:05Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Federated Dynamic Sparse Training: Computing Less, Communicating Less,
Yet Learning Better [88.28293442298015]
Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices.
We develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST)
FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network.
arXiv Detail & Related papers (2021-12-18T02:26:38Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win [8.700592446069395]
NNs can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training.
In this paper we show that naively training unstructured sparse NNs from random initialization results in significantly worse generalization.
We also show that Lottery Tickets (LTs) do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from.
arXiv Detail & Related papers (2020-10-07T17:26:08Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.