Related papers: Pruning Pre-trained Language Models Without Fine-Tuning

Pruning Pre-trained Language Models Without Fine-Tuning

URL: http://arxiv.org/abs/2210.06210v2
Date: Tue, 16 May 2023 06:24:43 GMT
Title: Pruning Pre-trained Language Models Without Fine-Tuning
Authors: Ting Jiang, Deqing Wang, Fuzhen Zhuang, Ruobing Xie, Feng Xia
Abstract summary: We argue fine-tuning is redundant for first-order pruning, since first-order pruning is sufficient to converge PLMs to downstream tasks without fine-tuning. Under this motivation, we propose Static Model Pruning (SMP), which only uses first-order pruning to adapt PLMs to downstream tasks while achieving the target sparsity level.
Score: 42.54071630668426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To overcome the overparameterized problem in Pre-trained Language Models (PLMs), pruning is widely used as a simple and straightforward compression method by directly removing unimportant weights. Previous first-order methods successfully compress PLMs to extremely high sparsity with little performance drop. These methods, such as movement pruning, use first-order information to prune PLMs while fine-tuning the remaining weights. In this work, we argue fine-tuning is redundant for first-order pruning, since first-order pruning is sufficient to converge PLMs to downstream tasks without fine-tuning. Under this motivation, we propose Static Model Pruning (SMP), which only uses first-order pruning to adapt PLMs to downstream tasks while achieving the target sparsity level. In addition, we also design a new masking function and training objective to further improve SMP. Extensive experiments at various sparsity levels show SMP has significant improvements over first-order and zero-order methods. Unlike previous first-order methods, SMP is also applicable to low sparsity and outperforms zero-order methods. Meanwhile, SMP is more parameter efficient than other methods due to it does not require fine-tuning.

Related papers

Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training [44.48966200270378]
Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO)imats presents significant computational challenges. We propose a bilevel optimization framework that complements ZO methods with PEFT to mitigate sensitivity to hard prompts. Our Bilevel ZOFO method employs a double-loop optimization strategy, where only the gradient of the PEFT model and the forward pass of the base model are required.
arXiv Detail & Related papers (2025-02-05T20:47:44Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models. We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models [53.638791265113625]
Sparsity-Preserved efficient fine-tuning method for large language models. Code will be made available at https://github.com/Lucky-Lance/SPP.
arXiv Detail & Related papers (2024-05-25T04:55:27Z)
Just CHOP: Embarrassingly Simple LLM Compression [27.64461490974072]
Large language models (LLMs) enable unparalleled few- and zero-shot reasoning capabilities but at a high computational footprint. We show that simple layer pruning coupled with an extended language model pretraining produces state-of-the-art results against structured and even semi-structured compression of models at a 7B scale. We also show how distillation, which has been super effective in task-agnostic compression of smaller BERT-style models, becomes inefficient against our simple pruning technique.
arXiv Detail & Related papers (2023-05-24T08:18:35Z)
NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better [98.5705258907774]
Finetuning pretrained language models (PLMs) is critical for their success in downstream tasks. PLMs may have risks in overfitting pretraining signals, and there are gaps between downstream tasks and the pretraining tasks. NoisyTune can help better finetune PLMs in downstream tasks by adding some noise to the parameters of PLMs before finetuning.
arXiv Detail & Related papers (2022-02-24T11:08:02Z)
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads [114.77890059625162]
We propose a method, called Single-Shot Meta-Pruning, to compress deep pre-trained Transformers before fine-tuning. We focus on pruning unnecessary attention heads adaptively for different downstream tasks. Compared with existing compression methods for pre-trained models, our method can reduce the overhead of both fine-tuning and inference.
arXiv Detail & Related papers (2020-11-07T12:58:37Z)
Movement Pruning: Adaptive Sparsity by Fine-Tuning [115.91907953454034]
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes.
arXiv Detail & Related papers (2020-05-15T17:54:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.