Automatic Attention Pruning: Improving and Automating Model Pruning
using Attentions
- URL: http://arxiv.org/abs/2303.08595v1
- Date: Tue, 14 Mar 2023 02:47:57 GMT
- Title: Automatic Attention Pruning: Improving and Automating Model Pruning
using Attentions
- Authors: Kaiqi Zhao, Animesh Jain, Ming Zhao
- Abstract summary: Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices.
This paper presents Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models.
- Score: 5.445935252764351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pruning is a promising approach to compress deep learning models in order to
deploy them on resource-constrained edge devices. However, many existing
pruning solutions are based on unstructured pruning, which yields models that
cannot efficiently run on commodity hardware; and they often require users to
manually explore and tune the pruning process, which is time-consuming and
often leads to sub-optimal results. To address these limitations, this paper
presents Automatic Attention Pruning (AAP), an adaptive, attention-based,
structured pruning approach to automatically generate small, accurate, and
hardware-efficient models that meet user objectives. First, it proposes
iterative structured pruning using activation-based attention maps to
effectively identify and prune unimportant filters. Then, it proposes adaptive
pruning policies for automatically meeting the pruning objectives of
accuracy-critical, memory-constrained, and latency-sensitive tasks. A
comprehensive evaluation shows that AAP substantially outperforms the
state-of-the-art structured pruning works for a variety of model architectures.
Our code is at: https://github.com/kaiqi123/Automatic-Attention-Pruning.git.
Related papers
- Sample-aware Adaptive Structured Pruning for Large Language Models [14.605017410864583]
This study introduces AdaPruner, a sample-aware adaptive structured pruning framework for large language models (LLMs)
Specifically, AdaPruner effectively removes redundant parameters from LLMs by constructing a structured pruning solution space.
At a 20% pruning ratio, the model pruned with AdaPruner maintains 97% of the performance of the unpruned model.
arXiv Detail & Related papers (2025-03-08T12:00:21Z) - Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model.
In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction.
Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z) - RL-Pruner: Structured Pruning Using Reinforcement Learning for CNN Compression and Acceleration [0.0]
We propose RL-Pruner, which uses reinforcement learning to learn the optimal pruning distribution.
RL-Pruner can automatically extract dependencies between filters in the input model and perform pruning, without requiring model-specific pruning implementations.
arXiv Detail & Related papers (2024-11-10T13:35:10Z) - Fluctuation-based Adaptive Structured Pruning for Large Language Models [44.217363567065]
FLAP (FLuctuation-based Adaptive Structured Pruning) is a retraining-free structured pruning framework for Large Language Models.
It is hardware-friendly by effectively reducing storage and enhancing inference speed.
arXiv Detail & Related papers (2023-12-19T09:23:48Z) - Structured Pruning for Multi-Task Deep Neural Networks [25.916166808223743]
Multi-task deep neural network (DNN) models have computation and storage benefits over individual single-task models.
We investigate the effectiveness of structured pruning on multi-task models.
arXiv Detail & Related papers (2023-04-13T22:15:47Z) - PDSketch: Integrated Planning Domain Programming and Learning [86.07442931141637]
We present a new domain definition language, named PDSketch.
It allows users to flexibly define high-level structures in the transition models.
Details of the transition model will be filled in by trainable neural networks.
arXiv Detail & Related papers (2023-03-09T18:54:12Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Iterative Activation-based Structured Pruning [5.445935252764351]
Iterative Activation-based Pruning and Adaptive Iterative Activation-based Pruning are proposed.
We observe that, with only 1% accuracy loss, IAP andAIAP achieve 7.75X and 15.88$X compression on LeNet-5, and 1.25X and 1.71X compression on ResNet-50.
arXiv Detail & Related papers (2022-01-22T00:48:12Z) - Adaptive Activation-based Structured Pruning [5.445935252764351]
Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices.
This paper presents an adaptive, activation-based, structured pruning approach to automatically and efficiently generate small, accurate, and hardware-efficient models.
A comprehensive evaluation shows that the proposed method can substantially outperform the state-of-the-art structured pruning works.
arXiv Detail & Related papers (2022-01-21T22:21:31Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z) - Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.