ADMM Based Semi-Structured Pattern Pruning Framework For Transformer
- URL: http://arxiv.org/abs/2407.08334v4
- Date: Fri, 23 Aug 2024 08:36:41 GMT
- Title: ADMM Based Semi-Structured Pattern Pruning Framework For Transformer
- Authors: TianChen Wang,
- Abstract summary: This paper introduces Alternating Direction Method of Multipliers(ADMM) based pattern pruning framework to reshape the distribution of activation map.
We conduct extensive experiments on classification tasks over GLUE dataset.
We achieve 50% percent compression ratio while maintaining overall score 80.1 on GLUE dataset.
- Score: 4.02487511510606
- License:
- Abstract: NLP(natural language processsing) has achieved great success through the transformer model.However, the model has hundreds of millions or billions parameters,which is huge burden for its deployment on personal computer or small scale of server.To deal with it, we either make the model's weight matrix relatively sparser, or compress attention layer. Pattern pruning ,one of the most important pruning methods, permits selecting fixed number of parameters in each divided pattern block and prunes it. However, the effect of pattern pruning is strictly limited by the sparsity within a region of weights in each layer. In this paper,we first introduced Alternating Direction Method of Multipliers(ADMM) based pattern pruning framework to reshape the distribution of activation map. Specifically, we propose to formulate the pattern pruning on transformer as a constrained optimization and use ADMM to optimize the problem. In this way, the initial dense feature maps is transformed to rather regionally sparsified ones.Therefore, we can then achieve higher compression ratio with better performance based on pattern pruning method. Additionally, this paper provides a theoretical derivations of the ADMM with local sparsity. Finally, we also extend the proposed ADMM based framework with SR-STE to demonstrate its generalization and to avoid gradient vanishing problem. We conduct extensive experiments on classification tasks over GLUE datasets. Significantly, we achieve 50% percent compression ratio while maintaining overall score 80.1 on GLUE dataset.
Related papers
- Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training [3.195234044113248]
We exploit functional information from dense pre-trained models to obtain sparse models that maximize the activations' alignment w.r.t.
We propose textscNeuroAl, a emphtop-up algorithm that modifies the block-wise and row-wise sparsity ratios to maximize the emphneuron alignment among activations.
We test our method on 4 different LLM families and 3 different sparsity ratios, showing how it consistently outperforms the latest state-of-the-art techniques.
arXiv Detail & Related papers (2024-11-11T15:30:16Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation [54.28841287750586]
Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.
Existing solutions such as SparseGPT and Wanda attempt to alleviate this issue through weight pruning.
This paper introduces a novel LLM pruning technique dubbed blockwise parameter-efficient sparsity allocation (BESA) by applying a blockwise reconstruction loss.
arXiv Detail & Related papers (2024-02-18T12:44:15Z) - Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks.
We propose Single-Path Vision Transformer pruning (SPViT) to efficiently and automatically compress the pre-trained ViTs.
Our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously.
arXiv Detail & Related papers (2021-11-23T11:35:54Z) - Improving the Accuracy-Memory Trade-Off of Random Forests Via
Leaf-Refinement [6.967385165474138]
Random Forests (RF) are among the state-of-the-art in many machine learning applications.
We show that the improvement effects of pruning diminish for ensembles of large trees but that pruning has an overall better accuracy-memory trade-off than RF.
We present a simple, yet surprisingly effective algorithm that refines the predictions in the leaf nodes in the forest via gradient descent.
arXiv Detail & Related papers (2021-10-19T16:06:43Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z) - Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z) - Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
Identity Prior [54.629850694790036]
spectral-normalized identity priors (SNIP) is a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping.
We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance.
arXiv Detail & Related papers (2020-10-05T05:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.