Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
- URL: http://arxiv.org/abs/2405.05808v1
- Date: Thu, 9 May 2024 14:47:15 GMT
- Title: Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
- Authors: Ruihao Gong, Yang Yong, Zining Wang, Jinyang Guo, Xiuying Wei, Yuqing Ma, Xianglong Liu,
- Abstract summary: We propose a controllable post-training sparsity (FCPTS) framework for neural network sparsity.
Our method allows for rapid and accurate sparsity allocation learning in minutes, with the added assurance of convergence to a global sparsity rate.
- Score: 33.68058313321142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accuracy degradation due to neglect of the reasonable sparsity rate at each layer. Previous methods for finding sparsity rates mainly focus on the training-aware scenario, which usually fails to converge stably under the PTS setting with limited data and much less training cost. In this paper, we propose a fast and controllable post-training sparsity (FCPTS) framework. By incorporating a differentiable bridge function and a controllable optimization objective, our method allows for rapid and accurate sparsity allocation learning in minutes, with the added assurance of convergence to a predetermined global sparsity rate. Equipped with these techniques, we can surpass the state-of-the-art methods by a large margin, e.g., over 30\% improvement for ResNet-50 on ImageNet under the sparsity rate of 80\%. Our plug-and-play code and supplementary materials are open-sourced at https://github.com/ModelTC/FCPTS.
Related papers
- Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning [4.421875265386832]
Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks.
Recent sparse learning methods have shown promising performance up to moderate sparsity levels such as 95% and 98%.
We propose a collection of techniques that enable the continuous learning of networks without accuracy collapse even at extreme sparsities.
arXiv Detail & Related papers (2024-11-20T18:54:53Z) - UniPTS: A Unified Framework for Proficient Post-Training Sparsity [67.16547529992928]
Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need.
In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS.
Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
arXiv Detail & Related papers (2024-05-29T06:53:18Z) - Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for
Sparse Training [48.152207339344564]
We show that layer freezing and data sieving can be incorporated into the sparse training algorithm to form a generic framework, which we dub SpFDE.
Our experiments demonstrate that SpFDE can significantly reduce training costs while preserving accuracy from three dimensions: weight sparsity, layer freezing, and dataset sieving.
arXiv Detail & Related papers (2022-09-22T17:45:23Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - Picking Winning Tickets Before Training by Preserving Gradient Flow [9.67608102763644]
We argue that efficient training requires preserving the gradient flow through the network.
We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet.
arXiv Detail & Related papers (2020-02-18T05:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.