Related papers: GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs

GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs

URL: http://arxiv.org/abs/2408.15300v1
Date: Tue, 27 Aug 2024 14:41:14 GMT
Title: GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
Authors: Maxim Zhelnin, Viktor Moskvoretskii, Egor Shvetsov, Egor Venediktov, Mariya Krylova, Aleksandr Zuev, Evgeny Burnaev,
Abstract summary: We introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW) Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget.
Score: 51.02233412547456
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Parameter Efficient Fine-Tuning (PEFT) methods have gained popularity and democratized the usage of Large Language Models (LLMs). Recent studies have shown that a small subset of weights significantly impacts performance. Based on this observation, we introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW). Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. To identify these columns, we developeda generalized sensitivity metric that extends and unifies metrics from previous studies. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget. Moreover, GIFT-SW offers practical advantages to recover performance of models subjected to mixed-precision quantization with keeping salient weights in full precision.

Related papers

RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE) RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers. Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
Unveiling the Mystery of Weight in Large Foundation Models: Gaussian Distribution Never Fades [14.113021234825084]
This paper presents a pioneering exploration of the mechanisms underlying large foundation models' weights. We find that their weights predominantly follow a Gaussian distribution, with occasional sharp, inverted T-shaped, or linear patterns. We conclude that optimal weights should exhibit zero-mean, symmetry, and sparsity, with the sparse values being a truncated Gaussian distribution and a few outliers.
arXiv Detail & Related papers (2025-01-18T05:43:17Z)
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization [5.718172547021947]
We propose Noise Perturbation Fine-tuning (NPFT) to tame the sensitive weights' impact on the quantization error. NPFT identifies outlier weights and add random weight perturbations on the outliers as the model going through a PEFT optimization. When applied to OPT and LLaMA models, our NPFT method achieves stable performance improvements for both uniform and non-uniform quantizers.
arXiv Detail & Related papers (2024-12-08T21:46:22Z)
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models [68.55148272295916]
We propose IntLoRA, to push the efficiency limits by using integer type (INT) low-rank parameters to adapt the quantized diffusion models. IntLoRA offers three key advantages: (i) for fine-tuning, the pre-trained weights are quantized, reducing memory usage; (ii) for storage, both pre-trained and low-rank weights are in INT which consumes less disk space; (iii) for inference, IntLoRA weights can be naturally merged into quantized pre-trained weights through efficient integer multiplication or bit-shifting.
arXiv Detail & Related papers (2024-10-29T05:50:17Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase. Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative. We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models [26.808251361020066]
Fine-tuning pre-trained models is resource-intensive and laborious. One widely adopted PEFT technique, Low-Rank Adaptation (LoRA), freezes the pre-trained model weights. NEAT introduces a lightweight neural network that takes pre-trained weights as input and learns a nonlinear transformation to approximate cumulative weight updates.
arXiv Detail & Related papers (2024-10-02T17:29:23Z)
TRAWL: Tensor Reduced and Approximated Weights for Large Language Models [11.064868044313855]
We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns. Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.
arXiv Detail & Related papers (2024-06-25T04:01:32Z)
Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models [5.352221132808875]
Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration. We propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Weight Averaging (SWAG) We show that our method exhibits greater robustness against distribution shift, as reflected in its improved performance on out-of-distribution tasks.
arXiv Detail & Related papers (2024-05-06T12:44:37Z)
Data-freeWeight Compress and Denoise for Large Language Models [101.53420111286952]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices. We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z)
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models [20.5908375260123]
Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. We present LoRETTA, a framework that significantly reduces trainable parameters through tensor-train decomposition. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100times$ fewer parameters on the LLaMA-2-7B models.
arXiv Detail & Related papers (2024-02-18T01:20:00Z)
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z)
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs [9.072821427818557]
Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment. We propose an efficient weight-only quantization method that reduces memory consumption and accelerates inference for LLMs. We evaluate our approach on large-scale open source models such as OPT-175B and internal MoE models, showcasing minimal accuracy loss while achieving up to 3.65 times higher throughput.
arXiv Detail & Related papers (2023-08-16T23:57:41Z)
Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning. At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.