UniPTS: A Unified Framework for Proficient Post-Training Sparsity
- URL: http://arxiv.org/abs/2405.18810v1
- Date: Wed, 29 May 2024 06:53:18 GMT
- Title: UniPTS: A Unified Framework for Proficient Post-Training Sparsity
- Authors: Jingjing Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji,
- Abstract summary: Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need.
In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS.
Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
- Score: 67.16547529992928
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS.
Related papers
- Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes [33.68058313321142]
We propose a controllable post-training sparsity (FCPTS) framework for neural network sparsity.
Our method allows for rapid and accurate sparsity allocation learning in minutes, with the added assurance of convergence to a global sparsity rate.
arXiv Detail & Related papers (2024-05-09T14:47:15Z) - Enhanced Sparsification via Stimulative Training [36.0559905521154]
Existing methods commonly set sparsity-inducing penalty terms to suppress the importance of dropped weights.
We propose a structured pruning framework, named expressivity, based on an enhanced sparsification paradigm.
To reduce the huge capacity gap of distillation, we propose a mutating expansion technique.
arXiv Detail & Related papers (2024-03-11T04:05:17Z) - EcoTTA: Memory-Efficient Continual Test-time Adaptation via
Self-distilled Regularization [71.70414291057332]
TTA may primarily be conducted on edge devices with limited memory.
Long-term adaptation often leads to catastrophic forgetting and error accumulation.
We present lightweight meta networks that can adapt the frozen original networks to the target domain.
arXiv Detail & Related papers (2023-03-03T13:05:30Z) - Trainability Preserving Neural Structured Pruning [64.65659982877891]
We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification.
TPP can compete with the ground-truth dynamical isometry recovery method on linear networks.
It delivers encouraging performance in comparison to many top-performing filter pruning methods.
arXiv Detail & Related papers (2022-07-25T21:15:47Z) - Federated Progressive Sparsification (Purge, Merge, Tune)+ [15.08232397899507]
FedSparsify is a sparsification strategy based on progressive weight magnitude pruning.
We show experimentally that FedSparsify learns a subnetwork of both high sparsity and learning performance.
arXiv Detail & Related papers (2022-04-26T16:45:53Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - The Unreasonable Effectiveness of Random Pruning: Return of the Most
Naive Baseline for Sparse Training [111.15069968583042]
Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training.
We empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent.
Our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning.
arXiv Detail & Related papers (2022-02-05T21:19:41Z) - Connectivity Matters: Neural Network Pruning Through the Lens of
Effective Sparsity [0.0]
Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes.
We show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart.
We develop a low-cost extension to most pruning algorithms to aim for effective, rather than direct, sparsity.
arXiv Detail & Related papers (2021-07-05T22:36:57Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z) - Picking Winning Tickets Before Training by Preserving Gradient Flow [9.67608102763644]
We argue that efficient training requires preserving the gradient flow through the network.
We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet.
arXiv Detail & Related papers (2020-02-18T05:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.