Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic
Forgetting
- URL: http://arxiv.org/abs/2402.12220v1
- Date: Mon, 19 Feb 2024 15:26:19 GMT
- Title: Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic
Forgetting
- Authors: Haolin Chen, Philip N. Garner
- Abstract summary: We show that existing Bayesian learning techniques can be applied to prevent catastrophic forgetting.
Our results demonstrate that catastrophic forgetting can be overcome by our methods without degrading the fine-tuning performance.
- Score: 12.474522847102207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although motivated by the adaptation of text-to-speech synthesis models, we
argue that more generic parameter-efficient fine-tuning (PEFT) is an
appropriate framework to do such adaptation. However, catastrophic forgetting
remains an issue with PEFT, damaging the pre-trained model's inherent
capabilities. We demonstrate that existing Bayesian learning techniques can be
applied to PEFT to prevent catastrophic forgetting as long as the parameter
shift of the fine-tuned layers can be calculated differentiably. In a
principled series of experiments on language modeling and speech synthesis
tasks, we utilize established Laplace approximations, including diagonal and
Kronecker factored approaches, to regularize PEFT with the low-rank adaptation
(LoRA) and compare their performance in pre-training knowledge preservation.
Our results demonstrate that catastrophic forgetting can be overcome by our
methods without degrading the fine-tuning performance, and using the Kronecker
factored approximations produces a better preservation of the pre-training
knowledge than the diagonal ones.
Related papers
- Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [73.88009808326387]
We propose a novel spectrum-aware adaptation framework for generative models.
Our method adjusts both singular values and their basis vectors of pretrained weights.
We introduce Spectral Ortho Decomposition Adaptation (SODA), which balances computational efficiency and representation capacity.
arXiv Detail & Related papers (2024-05-31T17:43:35Z) - ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z) - SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models [1.2263658159556594]
Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task.
We propose Stratified Progressive Adaptation Fine-tuning (SPAFIT) based on the localization of different types of linguistic knowledge.
Our experiments, conducted on nine tasks from the GLUE benchmark, show that our proposed SPAFIT method outperforms other PEFT methods.
arXiv Detail & Related papers (2024-04-30T21:07:32Z) - Parameter Efficient Finetuning for Speech Emotion Recognition and Domain
Adaptation [13.774287532165019]
This paper investigates parameter-efficient finetuning (PEFT) for speech emotion recognition (SER)
Various PEFT adaptors are systematically studied for both classification of discrete emotion categories and prediction of dimensional emotional attributes.
A two-stage adaptation strategy is proposed to adapt models trained on acted emotion data, which is more readily available, to make the model more adept at capturing natural emotional expressions.
arXiv Detail & Related papers (2024-02-19T00:21:07Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared
Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models.
We also propose a simple pre-training technique that leads to fully or partially shared models.
Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z) - Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS)
A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters.
Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z) - Rethinking Efficient Tuning Methods from a Unified Perspective [34.67645496324432]
We revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning.
The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning.
arXiv Detail & Related papers (2023-03-01T17:38:03Z) - Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.