AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks
- URL: http://arxiv.org/abs/2304.06941v1
- Date: Fri, 14 Apr 2023 06:19:07 GMT
- Title: AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks
- Authors: Abhisek Kundu, Naveen K. Mellempudi, Dharma Teja Vooturi, Bharat Kaul,
Pradeep Dubey
- Abstract summary: We propose Gradient Annealing (GA) to explore the non-uniform distribution of sparsity inherent within neural networks.
GA provides an elegant trade-off between sparsity and accuracy without the need for additional sparsity-inducing regularization.
We integrate GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse.
- Score: 2.6742343015805083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse training is emerging as a promising avenue for reducing the
computational cost of training neural networks. Several recent studies have
proposed pruning methods using learnable thresholds to efficiently explore the
non-uniform distribution of sparsity inherent within the models. In this paper,
we propose Gradient Annealing (GA), where gradients of masked weights are
scaled down in a non-linear manner. GA provides an elegant trade-off between
sparsity and accuracy without the need for additional sparsity-inducing
regularization. We integrated GA with the latest learnable pruning methods to
create an automated sparse training algorithm called AutoSparse, which achieves
better accuracy and/or training/inference FLOPS reduction than existing
learnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K:
AutoSparse achieves (2x, 7x) reduction in (training,inference) FLOPS for
ResNet50 on ImageNet at 80% sparsity. Finally, AutoSparse outperforms
sparse-to-sparse SotA method MEST (uniform sparsity) for 80% sparse ResNet50
with similar accuracy, where MEST uses 12% more training FLOPS and 50% more
inference FLOPS.
Related papers
- Towards Generalized Entropic Sparsification for Convolutional Neural Networks [0.0]
Convolutional neural networks (CNNs) are reported to be overparametrized.
Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally-scalable entropic relaxation of the pruning problem.
The sparse subnetwork is found from the pre-trained (full) CNN using the network entropy minimization as a sparsity constraint.
arXiv Detail & Related papers (2024-04-06T21:33:39Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Dynamic Sparse Training via Balancing the Exploration-Exploitation
Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size.
Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies.
In this work, we consider the dynamic sparse training as a sparse connectivity search problem.
Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z) - Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints [81.46143788046892]
We focus on the task of controlling the level of sparsity when performing sparse learning.
Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor.
We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
arXiv Detail & Related papers (2022-08-08T21:24:20Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Structured Directional Pruning via Perturbation Orthogonal Projection [13.704348351073147]
A more reasonable approach is to find a sparse minimizer along the flat minimum valley found byNIST.
We propose the structured directional pruning based on projecting the perturbations onto the flat minimum valley.
Experiments show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining.
arXiv Detail & Related papers (2021-07-12T11:35:47Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images.
When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training.
We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z) - Campfire: Compressible, Regularization-Free, Structured Sparse Training
for Hardware Accelerators [0.04666493857924356]
This paper studies structured sparse training of CNNs with a gradual pruning technique.
We simplify the structure of the enforced sparsity so that it reduces overhead caused by regularization.
We show that our method creates a sparse version of ResNet-50 and ResNet-50 v1.5 on full ImageNet while remaining within a negligible 1% margin of accuracy loss.
arXiv Detail & Related papers (2020-01-09T23:15:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.