Related papers: SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

URL: http://arxiv.org/abs/2001.08839v1
Date: Thu, 23 Jan 2020 22:45:02 GMT
Title: SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency
Authors: Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, Yanzhi Wang
Abstract summary: We propose a framework to mitigate the limitations of structured weight pruning. The proposed framework can achieve ultra-high rates while maintaining accuracy. experiments on CIFARAR-100 datasets demonstrate that the proposed framework can achieve ultra-high accuracy.
Score: 42.63352504047665
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structured weight pruning is a representative model compression technique of DNNs for hardware efficiency and inference accelerations. Previous works in this area leave great space for improvement since sparse structures with combinations of different structured pruning schemes are not exploited fully and efficiently. To mitigate the limitations, we propose SS-Auto, a single-shot, automatic structured pruning framework that can achieve row pruning and column pruning simultaneously. We adopt soft constraint-based formulation to alleviate the strong non-convexity of l0-norm constraints used in state-of-the-art ADMM-based methods for faster convergence and fewer hyperparameters. Instead of solving the problem directly, a Primal-Proximal solution is proposed to avoid the pitfall of penalizing all weights equally, thereby enhancing the accuracy. Extensive experiments on CIFAR-10 and CIFAR-100 datasets demonstrate that the proposed framework can achieve ultra-high pruning rates while maintaining accuracy. Furthermore, significant inference speedup has been observed from the proposed framework through actual measurements on the smartphone.

Related papers

FASP: Fast and Accurate Structured Pruning of Large Language Models [24.185245582500876]
We introduce FASP (Fast and Accurate Structured Pruning), a novel structured pruning framework for large language models (LLMs) FASP employs a distinctive pruning structure that interlinks sequential layers, allowing for the removal of columns in one layer while simultaneously eliminating corresponding rows in the preceding layer without incurring additional performance loss. We evaluate FASP on the OPT and LLaMA model families, demonstrating superior performance in terms of perplexity and accuracy on downstream tasks compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-16T09:38:39Z)
Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting. We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem. We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
Only Train Once: A One-Shot Neural Network Training And Pruning Framework [31.959625731943675]
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. We propose a framework that DNNs are slimmer with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO) OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-Image optimization algorithm, Half-Space Projected (HSPG) To demonstrate the effectiveness of OTO, we train and
arXiv Detail & Related papers (2021-07-15T17:15:20Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z)
A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods [31.869228048294445]
We propose a unified DNN weight pruning framework with dynamically updated regularization terms bounded by the designated constraint. We also extend our method to an integrated framework for the combination of different DNN compression tasks.
arXiv Detail & Related papers (2020-04-12T02:59:06Z)
A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.