Adaptive Activation-based Structured Pruning
- URL: http://arxiv.org/abs/2201.10520v1
- Date: Fri, 21 Jan 2022 22:21:31 GMT
- Title: Adaptive Activation-based Structured Pruning
- Authors: Kaiqi Zhao, Animesh Jain, Ming Zhao
- Abstract summary: Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices.
This paper presents an adaptive, activation-based, structured pruning approach to automatically and efficiently generate small, accurate, and hardware-efficient models.
A comprehensive evaluation shows that the proposed method can substantially outperform the state-of-the-art structured pruning works.
- Score: 5.445935252764351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pruning is a promising approach to compress complex deep learning models in
order to deploy them on resource-constrained edge devices. However, many
existing pruning solutions are based on unstructured pruning, which yield
models that cannot efficiently run on commodity hardware, and require users to
manually explore and tune the pruning process, which is time consuming and
often leads to sub-optimal results. To address these limitations, this paper
presents an adaptive, activation-based, structured pruning approach to
automatically and efficiently generate small, accurate, and hardware-efficient
models that meet user requirements. First, it proposes iterative structured
pruning using activation-based attention feature maps to effectively identify
and prune unimportant filters. Then, it proposes adaptive pruning policies for
automatically meeting the pruning objectives of accuracy-critical,
memory-constrained, and latency-sensitive tasks. A comprehensive evaluation
shows that the proposed method can substantially outperform the
state-of-the-art structured pruning works on CIFAR-10 and ImageNet datasets.
For example, on ResNet-56 with CIFAR-10, without any accuracy drop, our method
achieves the largest parameter reduction (79.11%), outperforming the related
works by 22.81% to 66.07%, and the largest FLOPs reduction (70.13%),
outperforming the related works by 14.13% to 26.53%.
Related papers
- Sample-aware Adaptive Structured Pruning for Large Language Models [14.605017410864583]
This study introduces AdaPruner, a sample-aware adaptive structured pruning framework for large language models (LLMs)
Specifically, AdaPruner effectively removes redundant parameters from LLMs by constructing a structured pruning solution space.
At a 20% pruning ratio, the model pruned with AdaPruner maintains 97% of the performance of the unpruned model.
arXiv Detail & Related papers (2025-03-08T12:00:21Z) - Lightweight and Post-Training Structured Pruning for On-Device Large Lanaguage Models [11.93284417365518]
We introduce COMP, a lightweight post-training structured pruning method that employs a hybrid-granularity pruning strategy.
COMP improves performance by 6.13% on the LLaMA-2-7B model with a 20% pruning ratio compared to LLM-Pruner.
arXiv Detail & Related papers (2025-01-25T16:03:58Z) - Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model.
In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction.
Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z) - ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models [14.310720048047136]
ALPS is an optimization-based framework that tackles the pruning problem using the operator splitting technique and a preconditioned gradient conjugate-based post-processing step.
Our approach incorporates novel techniques to accelerate and theoretically guarantee convergence while leveraging vectorization and GPU parallelism for efficiency.
On the OPT-30B model with 70% sparsity, ALPS achieves a 13% reduction in test perplexity on the WikiText dataset and a 19% improvement in zero-shot benchmark performance compared to existing methods.
arXiv Detail & Related papers (2024-06-12T02:57:41Z) - Efficient Pruning of Large Language Model with Adaptive Estimation Fusion [45.423001839959156]
We introduce a simple yet efficient method that adaptively models the importance of each substructure.
It can adaptively fuse coarse-grained and finegrained estimations based on the results from complex and multilayer structures.
Our experimental results, compared with state-of-the-art methods on mainstream datasets, demonstrate average accuracy improvements of 1.1%, 1.02%, 2.0%, and 1.2% for LLaMa-7B,Vicuna-7B, Baichuan-7B, and Bloom-7b1, respectively.
arXiv Detail & Related papers (2024-03-16T04:12:50Z) - Automatic Attention Pruning: Improving and Automating Model Pruning
using Attentions [5.445935252764351]
Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices.
This paper presents Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models.
arXiv Detail & Related papers (2023-03-14T02:47:57Z) - Controlled Sparsity via Constrained Optimization or: How I Learned to
Stop Tuning Penalties and Love Constraints [81.46143788046892]
We focus on the task of controlling the level of sparsity when performing sparse learning.
Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor.
We propose a constrained formulation where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion.
arXiv Detail & Related papers (2022-08-08T21:24:20Z) - Attentive Fine-Grained Structured Sparsity for Image Restoration [63.35887911506264]
N:M structured pruning has appeared as one of the effective and practical pruning approaches for making the model efficient with the accuracy constraint.
We propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer.
arXiv Detail & Related papers (2022-04-26T12:44:55Z) - Iterative Activation-based Structured Pruning [5.445935252764351]
Iterative Activation-based Pruning and Adaptive Iterative Activation-based Pruning are proposed.
We observe that, with only 1% accuracy loss, IAP andAIAP achieve 7.75X and 15.88$X compression on LeNet-5, and 1.25X and 1.71X compression on ResNet-50.
arXiv Detail & Related papers (2022-01-22T00:48:12Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.