Related papers: Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting

Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting

URL: http://arxiv.org/abs/2402.12220v3
Date: Fri, 06 Dec 2024 23:18:31 GMT
Title: Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting
Authors: Haolin Chen, Philip N. Garner,
Abstract summary: We show that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance.<n>Our results demonstrate that using the Kronecker-factored approximation produces a better preservation of the pre-training knowledge than the diagonal ones.
Score: 10.559392015748989
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We are motivated primarily by the adaptation of text-to-speech synthesis models; however we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. Nevertheless, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied to PEFT to prevent catastrophic forgetting as long as the parameter shift of the fine-tuned layers can be calculated differentiably. In a principled series of experiments on language modeling and speech synthesis tasks, we utilize established Laplace approximations, including diagonal and Kronecker-factored approaches, to regularize PEFT with the low-rank adaptation (LoRA) and compare their performance in pre-training knowledge preservation. Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance, and using the Kronecker-factored approximation produces a better preservation of the pre-training knowledge than the diagonal ones.

Related papers

High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning [57.85676271833619]
Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning.<n>We present textbfSMoA, a high-rank textbfStructured textbfMOdulation textbfAdapter that uses fewer trainable parameters while maintaining a higher rank.
arXiv Detail & Related papers (2026-01-12T13:06:17Z)
Bias-Restrained Prefix Representation Finetuning for Mathematical Reasoning [33.28699044085956]
Representation finetuning (ReFT) methods improve efficiency by freezing model weights and optimizing internal representations with fewer parameters than PEFT.<n>ReFT exhibits a significant performance decline on mathematical reasoning tasks.<n>This paper proposes Bias-REstrained Prefix Representation FineTuning (BREP ReFT), which enhances ReFT's mathematical reasoning capability.
arXiv Detail & Related papers (2025-11-13T05:15:36Z)
Thompson Sampling via Fine-Tuning of LLMs [68.1722422968855]
We propose an alternative based on Thompson sampling that eliminates the need for scalable large acquisition functions.<n>Our approach Thompson Sampling via Finening (ToSFiT) leverages the prior knowledge embedded in prompt-conditioned language models, and adapts incrementally toward the posterior.<n>Our analysis reveals the critical role of careful adaptation to the posterior probability of maximality-a principle that underpins our ToSFiT algorithm.
arXiv Detail & Related papers (2025-10-15T09:13:59Z)
EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation [13.084333776247743]
Fine-tuning can reduce robustness to distribution shifts, impacting out-of-distribution (OOD) performance. We propose a parameter-efficient fine-tuning (PEFT) method, using an indicator function to selectively activate Low-Rank Adaptation (LoRA) blocks. We demonstrate that effective fine-tuning can be achieved with as few as 5% of active blocks, substantially improving efficiency.
arXiv Detail & Related papers (2025-01-26T03:22:22Z)
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models [32.68721299475496]
Low-Rank Adaptation (LoRA) and its variants have gained significant attention due to their effectiveness. We propose a new PEFT method that combines two classes of adaptations, namely, transform and residual adaptations. Experiments are conducted on fine-tuning Stable Diffusion models in subject-driven and controllable generation.
arXiv Detail & Related papers (2025-01-15T11:10:37Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks. In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters. We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z)
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement [0.7558576228782637]
We propose a framework for efficient Source-Free Domain Adaptation (SFDA) Our approach introduces an improved paradigm for source-model preparation and target-side adaptation. We demonstrate that our framework is compatible with various SFDA methods and achieves significant computational efficiency.
arXiv Detail & Related papers (2024-10-03T02:12:03Z)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models. We propose a novel model fine-tuning method to make full use of these ineffective parameters. Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and their basis vectors of pretrained weights. We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z)
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models [1.2263658159556594]
Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. We propose Stratified Progressive Adaptation Fine-tuning (SPAFIT) based on the localization of different types of linguistic knowledge. Our experiments, conducted on nine tasks from the GLUE benchmark, show that our proposed SPAFIT method outperforms other PEFT methods.
arXiv Detail & Related papers (2024-04-30T21:07:32Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models. We also propose a simple pre-training technique that leads to fully or partially shared models. Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z)
Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS) A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters. Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z)
Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z)
Rethinking Efficient Tuning Methods from a Unified Perspective [34.67645496324432]
We revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning. The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning.
arXiv Detail & Related papers (2023-03-01T17:38:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.