An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning
- URL: http://arxiv.org/abs/2402.08096v3
- Date: Tue, 11 Feb 2025 18:25:07 GMT
- Title: An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning
- Authors: Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly,
- Abstract summary: A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning.
We propose a sampling scheme, textttbf mix-cd, that prioritizes rehearsal of collateral damage'' samples.
Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings.
- Score: 55.467047686093025
- License:
- Abstract: Incrementally fine-tuning foundational models on new tasks or domains is now the de facto approach in NLP. A known pitfall of this approach is the \emph{catastrophic forgetting} of prior knowledge that happens during fine-tuning. A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning. Several existing works assume a fixed memory buffer to store prior task examples, while relying on inferences (forward passes) with the model at hand for choosing examples for rehearsal from the buffer. However, given the increasing computational cost of model inference, and decreasing cost of data storage, we focus on the setting to rehearse samples with a fixed computational budget instead of a fixed memory budget. We propose a sampling scheme, \texttt{\bf mix-cd}, that prioritizes rehearsal of ``collateral damage'' samples, which are samples predicted correctly by the prior model but forgotten by the incrementally tuned one. The crux of our scheme is a procedure to efficiently estimate the density of collateral damage samples without incurring additional model inferences. Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings. All the code will be publicly available at https://github.com/jybai/mix-cd-rehearsal.
Related papers
- Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities.
We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses.
We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z) - Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.
diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.
We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z) - One-step Noisy Label Mitigation [86.57572253460125]
Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical.
We propose One-step Anti-Noise (OSA), a model-agnostic noisy label mitigation paradigm.
We empirically demonstrate the superiority of OSA, highlighting its enhanced training robustness, improved task transferability, ease of deployment, and reduced computational costs.
arXiv Detail & Related papers (2024-10-02T18:42:56Z) - DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting [14.390842560217743]
We propose a novel approach called DistPred for regression and forecasting tasks.
We transform proper scoring rules that measure the discrepancy between the predicted distribution and the target distribution into a differentiable discrete form.
This allows the model to sample numerous samples in a single forward pass to estimate the potential distribution of the response variable.
arXiv Detail & Related papers (2024-06-17T10:33:00Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Plug-and-Play split Gibbs sampler: embedding deep generative priors in
Bayesian inference [12.91637880428221]
This paper introduces a plug-and-play sampling algorithm that leverages variable splitting to efficiently sample from a posterior distribution.
It divides the challenging task of posterior sampling into two simpler sampling problems.
Its performance is compared to recent state-of-the-art optimization and sampling methods.
arXiv Detail & Related papers (2023-04-21T17:17:51Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.