Structured Pattern Pruning Using Regularization
- URL: http://arxiv.org/abs/2109.08814v1
- Date: Sat, 18 Sep 2021 03:01:29 GMT
- Title: Structured Pattern Pruning Using Regularization
- Authors: Dongjun Park, Geung-Hee Lee
- Abstract summary: Iterative Magnitude Pruning (IMP) is a network pruning method that repeats the process of removing weights with the least magnitudes and retraining the model.
Previous research has shown that a structured pattern emerges, wherein the resulting surviving weights tend to prominently cluster in a select few rows and columns of the matrix.
We propose SPUR, a novel pruning mechanism that preemptively induces structured patterns in compression by adding a regularization term to the objective function in the IMP.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Iterative Magnitude Pruning (IMP) is a network pruning method that repeats
the process of removing weights with the least magnitudes and retraining the
model. When visualizing the weight matrices of language models pruned by IMP,
previous research has shown that a structured pattern emerges, wherein the
resulting surviving weights tend to prominently cluster in a select few rows
and columns of the matrix. Though the need for further research in utilizing
these structured patterns for potential performance gains has previously been
indicated, it has yet to be thoroughly studied. We propose SPUR (Structured
Pattern pruning Using Regularization), a novel pruning mechanism that
preemptively induces structured patterns in compression by adding a
regularization term to the objective function in the IMP. Our results show that
SPUR can significantly preserve model performance under high sparsity settings
regardless of the language or the task. Our contributions are as follows: (i)
We propose SPUR, a network pruning mechanism that improves upon IMP regardless
of the language or the task. (ii) We are the first to empirically verify the
efficacy of "structured patterns" observed previously in pruning research.
(iii) SPUR is a resource-efficient mechanism in that it does not require
significant additional computations.
Related papers
- Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.
We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z) - RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)
RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.
Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - State-space models can learn in-context by gradient descent [1.3087858009942543]
This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning.
We prove that a single structured state-space model layer, augmented with local self-attention, can reproduce the outputs of an implicit linear model.
The theoretical construction elucidates the role of local self-attention and multiplicative interactions in recurrent architectures as the key ingredients for enabling the expressive power typical of foundation models.
arXiv Detail & Related papers (2024-10-15T15:22:38Z) - Autoregressive Moving-average Attention Mechanism for Time Series Forecasting [9.114664059026767]
We propose an Autoregressive (AR) Moving-average (MA) attention structure that can adapt to various linear attention mechanisms.
In this paper, we first demonstrate that, for the time series forecasting (TSF) task, the previously overlooked decoder-only autoregressive Transformer model can achieve results comparable to the best baselines.
arXiv Detail & Related papers (2024-10-04T05:45:50Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Isomorphic Pruning for Vision Models [56.286064975443026]
Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures.
We present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures.
arXiv Detail & Related papers (2024-07-05T16:14:53Z) - TRAWL: Tensor Reduced and Approximated Weights for Large Language Models [11.064868044313855]
We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns.
Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.
arXiv Detail & Related papers (2024-06-25T04:01:32Z) - LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models [9.244526043014098]
Large language models (LLMs) show excellent performance in difficult tasks, but they often require massive memories and computational resources.
In this study, we make an important observation that the multi-head self-attention (MHA) sub-layer of Transformer exhibits noticeable low-rank structure.
We propose a mixed compression model, which organically combines Low-Rank matrix And structured Pruning (LoRAP)
arXiv Detail & Related papers (2024-04-15T11:53:22Z) - Efficient Compression of Overparameterized Deep Models through
Low-Dimensional Learning Dynamics [10.673414267895355]
We present a novel approach for compressing over parameterized models.
Our algorithm improves the training efficiency by more than 2x, without compromising generalization.
arXiv Detail & Related papers (2023-11-08T23:57:03Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - What Matters In The Structured Pruning of Generative Language Models? [44.86217321428518]
Auto-regressive large language models such as GPT-3 require enormous computational resources to use.
Traditionally, structured pruning methods are employed to reduce resource usage.
We introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models.
arXiv Detail & Related papers (2023-02-07T22:05:55Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional.
We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.