Related papers: AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

URL: http://arxiv.org/abs/2304.06941v1
Date: Fri, 14 Apr 2023 06:19:07 GMT
Title: AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks
Authors: Abhisek Kundu, Naveen K. Mellempudi, Dharma Teja Vooturi, Bharat Kaul, Pradeep Dubey
Abstract summary: We propose Gradient Annealing (GA) to explore the non-uniform distribution of sparsity inherent within neural networks. GA provides an elegant trade-off between sparsity and accuracy without the need for additional sparsity-inducing regularization. We integrate GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse.
Score: 2.6742343015805083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sparse training is emerging as a promising avenue for reducing the computational cost of training neural networks. Several recent studies have proposed pruning methods using learnable thresholds to efficiently explore the non-uniform distribution of sparsity inherent within the models. In this paper, we propose Gradient Annealing (GA), where gradients of masked weights are scaled down in a non-linear manner. GA provides an elegant trade-off between sparsity and accuracy without the need for additional sparsity-inducing regularization. We integrated GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse, which achieves better accuracy and/or training/inference FLOPS reduction than existing learnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K: AutoSparse achieves (2x, 7x) reduction in (training,inference) FLOPS for ResNet50 on ImageNet at 80% sparsity. Finally, AutoSparse outperforms sparse-to-sparse SotA method MEST (uniform sparsity) for 80% sparse ResNet50 with similar accuracy, where MEST uses 12% more training FLOPS and 50% more inference FLOPS.

Related papers

Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration [1.3225694028747144]
This paper presents an efficient Loss-Aware Automatic Selection of Structured Pruning Criteria (LAASP) for slimming and accelerating deep neural networks.<n>The pruning-while-training approach eliminates the first stage and integrates the second and third stages into a single cycle.<n>Experiments on the VGGNet and ResNet models on the CIFAR-10 and ImageNet benchmark datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2025-06-25T06:18:46Z)
Towards Generalized Entropic Sparsification for Convolutional Neural Networks [0.0]
Convolutional neural networks (CNNs) are reported to be overparametrized. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally-scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pre-trained (full) CNN using the network entropy minimization as a sparsity constraint.
arXiv Detail & Related papers (2024-04-06T21:33:39Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z)
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size. Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies. In this work, we consider the dynamic sparse training as a sparse connectivity search problem. Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z)
Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints [81.46143788046892]
We focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor. We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
arXiv Detail & Related papers (2022-08-08T21:24:20Z)
MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z)
Structured Directional Pruning via Perturbation Orthogonal Projection [13.704348351073147]
A more reasonable approach is to find a sparse minimizer along the flat minimum valley found byNIST. We propose the structured directional pruning based on projecting the perturbations onto the flat minimum valley. Experiments show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining.
arXiv Detail & Related papers (2021-07-12T11:35:47Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design. Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars. EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training. We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z)
Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators [0.04666493857924356]
This paper studies structured sparse training of CNNs with a gradual pruning technique. We simplify the structure of the enforced sparsity so that it reduces overhead caused by regularization. We show that our method creates a sparse version of ResNet-50 and ResNet-50 v1.5 on full ImageNet while remaining within a negligible 1% margin of accuracy loss.
arXiv Detail & Related papers (2020-01-09T23:15:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.