Dynamic Sparse Training of Diagonally Sparse Networks
- URL: http://arxiv.org/abs/2506.11449v1
- Date: Fri, 13 Jun 2025 04:01:34 GMT
- Title: Dynamic Sparse Training of Diagonally Sparse Networks
- Authors: Abhishek Tyagi, Arjun Iyer, William H Renninger, Christopher Kanan, Yuhao Zhu,
- Abstract summary: unstructured sparsity often fails to translate into practical speedups on modern hardware.<n>We propose DynaDiag, a novel structured sparse-to-sparse method that performs at par with unstructured sparsity.<n>With 90% linear layers in ViTs, we observe up to a 3.13x speedup in online inference without sacrificing model performance.
- Score: 15.13506569122892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in Dynamic Sparse Training (DST) have pushed the frontier of sparse neural network training in structured and unstructured contexts, matching dense-model performance while drastically reducing parameter counts to facilitate model scaling. However, unstructured sparsity often fails to translate into practical speedups on modern hardware. To address this shortcoming, we propose DynaDiag, a novel structured sparse-to-sparse DST method that performs at par with unstructured sparsity. DynaDiag enforces a diagonal sparsity pattern throughout training and preserves sparse computation in forward and backward passes. We further leverage the diagonal structure to accelerate computation via a custom CUDA kernel, rendering the method hardware-friendly. Empirical evaluations on diverse neural architectures demonstrate that our method maintains accuracy on par with unstructured counterparts while benefiting from tangible computational gains. Notably, with 90% sparse linear layers in ViTs, we observe up to a 3.13x speedup in online inference without sacrificing model performance and a 1.59x speedup in training on a GPU compared to equivalent unstructured layers. Our source code is available at https://github.com/horizon-research/DynaDiag/.
Related papers
- A Distributed Data-Parallel PyTorch Implementation of the Distributed
Shampoo Optimizer for Training Neural Networks At-Scale [5.206015354543744]
Shampoo is an online and optimization algorithm belonging to the AdaGrad family of methods for training neural networks.
We provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch.
arXiv Detail & Related papers (2023-09-12T18:11:10Z) - Dynamic Sparsity Is Channel-Level Sparsity Learner [91.31071026340746]
Dynamic sparse training (DST) is a leading sparse training approach.
Channel-aware dynamic sparse (Chase) seamlessly translates the promise of unstructured dynamic sparsity to channel-level sparsity.
Our approach translates unstructured sparsity to channel-wise sparsity.
arXiv Detail & Related papers (2023-05-30T23:33:45Z) - Dynamic Sparse Training with Structured Sparsity [11.778353786208765]
Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training.
We propose a sparse-to-sparse DST method, Structured RigL (SRigL), to learn a variant of fine-grained structured N:M sparsity.
We demonstrate a real-world acceleration of 3.4x/2.5x on CPU for online inference and 1.7x/13.0x on GPU for inference with a batch size of 256.
arXiv Detail & Related papers (2023-05-03T17:48:55Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Accelerating DNN Training with Structured Data Gradient Pruning [0.5801044612920815]
Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient.
Modern accelerators such as the Nvidia A100 GPU support this type of structured sparsity for 2 nonzeros per 4 elements in a reduction.
Our approach can achieve a 15-25% reduction in total training time without significant impact to performance.
arXiv Detail & Related papers (2022-02-01T21:41:51Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z) - Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise
Sparsity [12.643043455369297]
We propose an algorithm-software co-designed pruning method that achieves latency speedups on existing dense architectures.
We implement and evaluate the sparsity pattern on GPU tensor core, achieving a 1.95x speedup over the dense model.
arXiv Detail & Related papers (2020-08-29T16:27:41Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.