Related papers: Iterative Activation-based Structured Pruning

Iterative Activation-based Structured Pruning

URL: http://arxiv.org/abs/2201.09881v1
Date: Sat, 22 Jan 2022 00:48:12 GMT
Title: Iterative Activation-based Structured Pruning
Authors: Kaiqi Zhao, Animesh Jain, Ming Zhao
Abstract summary: Iterative Activation-based Pruning and Adaptive Iterative Activation-based Pruning are proposed. We observe that, with only 1% accuracy loss, IAP andAIAP achieve 7.75X and 15.88$X compression on LeNet-5, and 1.25X and 1.71X compression on ResNet-50.
Score: 5.445935252764351
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deploying complex deep learning models on edge devices is challenging because they have substantial compute and memory resource requirements, whereas edge devices' resource budget is limited. To solve this problem, extensive pruning techniques have been proposed for compressing networks. Recent advances based on the Lottery Ticket Hypothesis (LTH) show that iterative model pruning tends to produce smaller and more accurate models. However, LTH research focuses on unstructured pruning, which is hardware-inefficient and difficult to accelerate on hardware platforms. In this paper, we investigate iterative pruning in the context of structured pruning because structurally pruned models map well on commodity hardware. We find that directly applying a structured weight-based pruning technique iteratively, called iterative L1-norm based pruning (ILP), does not produce accurate pruned models. To solve this problem, we propose two activation-based pruning methods, Iterative Activation-based Pruning (IAP) and Adaptive Iterative Activation-based Pruning (AIAP). We observe that, with only 1% accuracy loss, IAP and AIAP achieve 7.75X and 15.88$X compression on LeNet-5, and 1.25X and 1.71X compression on ResNet-50, whereas ILP achieves 4.77X and 1.13X, respectively.

Related papers

Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression [24.119415458653616]
We propose a novel unified pruning framework Comb, Prune, Distill (CPD) to address both model-agnostic and task-agnostic concerns simultaneously. Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence. In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.
arXiv Detail & Related papers (2024-08-06T09:02:31Z)
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes [72.09861461921663]
We develop a gradient-free, perturbative pruning method capable of delivering small, fast, and accurate pruned models. We also leverage Bonsai to produce a new sub-2B model using a single A6000 that yields state-of-the-art performance on 4/6 tasks on the Huggingface Open LLM leaderboard.
arXiv Detail & Related papers (2024-02-08T04:48:26Z)
Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions [5.445935252764351]
Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. This paper presents Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models.
arXiv Detail & Related papers (2023-03-14T02:47:57Z)
Advancing Model Pruning via Bi-level Optimization [89.88761425199598]
iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets' One-shot pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. We show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure.
arXiv Detail & Related papers (2022-10-08T19:19:29Z)
CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way. CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning. CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z)
Structured Pruning is All You Need for Pruning CNNs at Initialization [38.88730369884401]
Pruning is a popular technique for reducing the model size and computational cost of convolutional neural networks (CNNs) We propose PreCropping, a structured hardware-efficient model compression scheme. Compared to weight pruning, the proposed scheme is regular and dense in both storage and computation without sacrificing accuracy.
arXiv Detail & Related papers (2022-03-04T19:54:31Z)
Adaptive Activation-based Structured Pruning [5.445935252764351]
Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices. This paper presents an adaptive, activation-based, structured pruning approach to automatically and efficiently generate small, accurate, and hardware-efficient models. A comprehensive evaluation shows that the proposed method can substantially outperform the state-of-the-art structured pruning works.
arXiv Detail & Related papers (2022-01-21T22:21:31Z)
MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z)
Network Pruning via Resource Reallocation [75.85066435085595]
We propose a simple yet effective channel pruning technique, termed network Pruning via rEsource rEalLocation (PEEL) PEEL first constructs a predefined backbone and then conducts resource reallocation on it to shift parameters from less informative layers to more important layers in one round. Experimental results show that structures uncovered by PEEL exhibit competitive performance with state-of-the-art pruning algorithms under various pruning settings.
arXiv Detail & Related papers (2021-03-02T16:28:10Z)
AACP: Model Compression by Accurate and Automatic Channel Pruning [15.808153503786627]
Channel pruning is formulated as a neural architecture search (NAS) problem recently. Existing NAS-based methods are challenged by huge computational cost and inflexibility of applications. We propose a novel Accurate and Automatic Channel Pruning (AACP) method to address these problems.
arXiv Detail & Related papers (2021-01-31T06:19:29Z)
Network Automatic Pruning: Start NAP and Take a Nap [94.14675930881366]
We propose NAP, a unified and automatic pruning framework for both fine-grained and structured pruning. It can find out unimportant components of a network and automatically decide appropriate compression ratios for different layers. Despite its simpleness to use, NAP outperforms previous pruning methods by large margins.
arXiv Detail & Related papers (2021-01-17T07:09:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.