SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of
DNNs with Ultra-High Efficiency
- URL: http://arxiv.org/abs/2001.08839v1
- Date: Thu, 23 Jan 2020 22:45:02 GMT
- Title: SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of
DNNs with Ultra-High Efficiency
- Authors: Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng
Zhan, Zhenglun Kong, Geng Yuan, Yanzhi Wang
- Abstract summary: We propose a framework to mitigate the limitations of structured weight pruning.
The proposed framework can achieve ultra-high rates while maintaining accuracy.
experiments on CIFARAR-100 datasets demonstrate that the proposed framework can achieve ultra-high accuracy.
- Score: 42.63352504047665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured weight pruning is a representative model compression technique of
DNNs for hardware efficiency and inference accelerations. Previous works in
this area leave great space for improvement since sparse structures with
combinations of different structured pruning schemes are not exploited fully
and efficiently. To mitigate the limitations, we propose SS-Auto, a
single-shot, automatic structured pruning framework that can achieve row
pruning and column pruning simultaneously. We adopt soft constraint-based
formulation to alleviate the strong non-convexity of l0-norm constraints used
in state-of-the-art ADMM-based methods for faster convergence and fewer
hyperparameters. Instead of solving the problem directly, a Primal-Proximal
solution is proposed to avoid the pitfall of penalizing all weights equally,
thereby enhancing the accuracy. Extensive experiments on CIFAR-10 and CIFAR-100
datasets demonstrate that the proposed framework can achieve ultra-high pruning
rates while maintaining accuracy. Furthermore, significant inference speedup
has been observed from the proposed framework through actual measurements on
the smartphone.
Related papers
- Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - Only Train Once: A One-Shot Neural Network Training And Pruning
Framework [31.959625731943675]
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices.
We propose a framework that DNNs are slimmer with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO)
OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-Image optimization algorithm, Half-Space Projected (HSPG)
To demonstrate the effectiveness of OTO, we train and
arXiv Detail & Related papers (2021-07-15T17:15:20Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z) - A Unified DNN Weight Compression Framework Using Reweighted Optimization
Methods [31.869228048294445]
We propose a unified DNN weight pruning framework with dynamically updated regularization terms bounded by the designated constraint.
We also extend our method to an integrated framework for the combination of different DNN compression tasks.
arXiv Detail & Related papers (2020-04-12T02:59:06Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.