What Matters In The Structured Pruning of Generative Language Models?
- URL: http://arxiv.org/abs/2302.03773v1
- Date: Tue, 7 Feb 2023 22:05:55 GMT
- Title: What Matters In The Structured Pruning of Generative Language Models?
- Authors: Michael Santacroce, Zixin Wen, Yelong Shen, Yuanzhi Li
- Abstract summary: Auto-regressive large language models such as GPT-3 require enormous computational resources to use.
Traditionally, structured pruning methods are employed to reduce resource usage.
We introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models.
- Score: 44.86217321428518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Auto-regressive large language models such as GPT-3 require enormous
computational resources to use. Traditionally, structured pruning methods are
employed to reduce resource usage. However, their application to and efficacy
for generative language models is heavily under-explored. In this paper we
conduct an comprehensive evaluation of common structured pruning methods,
including magnitude, random, and movement pruning on the feed-forward layers in
GPT-type models. Unexpectedly, random pruning results in performance that is
comparable to the best established methods, across multiple natural language
generation tasks. To understand these results, we provide a framework for
measuring neuron-level redundancy of models pruned by different methods, and
discover that established structured pruning methods do not take into account
the distinctiveness of neurons, leaving behind excess redundancies. In view of
this, we introduce Globally Unique Movement (GUM) to improve the uniqueness of
neurons in pruned models. We then discuss the effects of our techniques on
different redundancy metrics to explain the improved performance.
Related papers
- Revisiting Large Language Model Pruning using Neuron Semantic Attribution [63.62836612864512]
We conduct evaluations on 24 datasets and 4 tasks using popular pruning methods.
We surprisingly find a significant performance drop of existing pruning methods in sentiment classification tasks.
We propose Neuron Semantic Attribution, which learns to associate each neuron with specific semantics.
arXiv Detail & Related papers (2025-03-03T13:52:17Z) - DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models [50.54264918467997]
Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks.
Recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language.
We propose Divergence Based Regularization (DBR) to mitigate this shortcut learning behavior.
arXiv Detail & Related papers (2025-02-25T16:44:10Z) - Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model.
In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction.
Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z) - DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization [61.492590008258986]
Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs.
We propose DRPruning, which incorporates distributionally robust optimization to restore balanced performance across domains.
arXiv Detail & Related papers (2024-11-21T12:02:39Z) - Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI)
We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation.
We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z) - Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models [35.10729451729596]
Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP)
However, expensive training as well as inference remains a significant impediment to their widespread applicability.
Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology.
arXiv Detail & Related papers (2024-02-28T22:21:47Z) - Pruning Pre-trained Language Models with Principled Importance and
Self-regularization [18.088550230146247]
Iterative pruning is one of the most effective compression methods for pre-trained language models.
We propose a self-regularization scheme where model prediction is regularized by the latest checkpoint with increasing sparsity throughout pruning.
Our experiments on natural language understanding, question-answering, named entity recognition, and data-to-text generation with various Transformer-based PLMs show the effectiveness of the approach at various sparsity levels.
arXiv Detail & Related papers (2023-05-21T08:15:12Z) - Regularization-based Pruning of Irrelevant Weights in Deep Neural
Architectures [0.0]
We propose a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm.
We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics.
arXiv Detail & Related papers (2022-04-11T09:44:16Z) - Probing Structured Pruning on Multilingual Pre-trained Models: Settings,
Algorithms, and Efficiency [62.0887259003594]
This work investigates three aspects of structured pruning on multilingual pre-trained language models: settings, algorithms, and efficiency.
Experiments on nine downstream tasks show several counter-intuitive phenomena.
We present Dynamic Sparsification, a simple approach that allows training the model once and adapting to different model sizes at inference.
arXiv Detail & Related papers (2022-04-06T06:29:52Z) - Structured Pattern Pruning Using Regularization [0.0]
Iterative Magnitude Pruning (IMP) is a network pruning method that repeats the process of removing weights with the least magnitudes and retraining the model.
Previous research has shown that a structured pattern emerges, wherein the resulting surviving weights tend to prominently cluster in a select few rows and columns of the matrix.
We propose SPUR, a novel pruning mechanism that preemptively induces structured patterns in compression by adding a regularization term to the objective function in the IMP.
arXiv Detail & Related papers (2021-09-18T03:01:29Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Movement Pruning: Adaptive Sparsity by Fine-Tuning [115.91907953454034]
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning.
We propose the use of movement pruning, a simple, deterministic first-order weight pruning method.
Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes.
arXiv Detail & Related papers (2020-05-15T17:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.