Related papers: Symmetric Pruning of Large Language Models

Symmetric Pruning of Large Language Models

URL: http://arxiv.org/abs/2501.18980v1
Date: Fri, 31 Jan 2025 09:23:06 GMT
Title: Symmetric Pruning of Large Language Models
Authors: Kai Yi, Peter Richtárik,
Abstract summary: Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs.<n>This paper introduces new theoretical insights that redefine the standard minimization objective for pruning.<n>We propose complementary strategies that consider both input activations and weight significance.
Score: 61.309982086292756
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs that have shown exceptional empirical performance. Wanda optimizes performance through calibrated activations during pruning, while RIA emphasizes the relative, rather than absolute, importance of weight elements. Despite their practical success, a thorough theoretical foundation explaining these outcomes has been lacking. This paper introduces new theoretical insights that redefine the standard minimization objective for pruning, offering a deeper understanding of the factors contributing to their success. Our study extends beyond these insights by proposing complementary strategies that consider both input activations and weight significance. We validate these approaches through rigorous experiments, demonstrating substantial enhancements over existing methods. Furthermore, we introduce a novel training-free fine-tuning approach $R^2$-DSnoT that incorporates relative weight importance and a regularized decision boundary within a dynamic pruning-and-growing framework, significantly outperforming strong baselines and establishing a new state of the art.

Related papers

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts [36.88984387787463]
We study the theoretical foundations of prompt-based techniques for fine-tuning large pre-trained models.<n>Our study demonstrates that re parameterization is not merely an engineering trick but is grounded in deep theoretical foundations.<n>Our findings provide theoretical and empirical contributions, advancing the understanding of prompt-based methods and their underlying mechanisms.
arXiv Detail & Related papers (2024-10-03T04:30:24Z)
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models [25.15698344467722]
This paper introduces a training-free Threshold-based Dynamic Activation method that leverage sequence information to exploit the inherent sparsity of models across various architectures. We theoretically analyze two of its critical features: history-related activation uncertainty and semantic-irrelevant activation inertia.
arXiv Detail & Related papers (2024-08-21T07:38:51Z)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads.<n>We take the first step to unify all approaches by dissecting them from a decomposition perspective.<n>We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z)
Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness and Efficiency [26.86557244460215]
We introduce an efficient and robust fine-tuning framework specifically designed to mitigate biases in new tasks. Our empirical analysis shows that the parameters in the pre-trained model that affect predictions for different demographic groups are different. We employ a transfer learning strategy that neutralizes the importance of these influential weights, determined using Fisher information across demographic groups.
arXiv Detail & Related papers (2024-03-01T16:01:28Z)
Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation. They still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z)
Uplift vs. predictive modeling: a theoretical analysis [1.2412255325209152]
This paper presents a comprehensive treatment of the subject, starting from firm theoretical foundations and highlighting the parameters that influence the performance of the uplift and predictive approaches. The focus of the paper is on a binary outcome case and a binary action, and the paper presents a theoretical analysis of uplift modeling, comparing it with the classical predictive approach.
arXiv Detail & Related papers (2023-09-21T12:59:17Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing. Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains. The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z)
Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches. At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.