Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?
- URL: http://arxiv.org/abs/2402.08096v1
- Date: Mon, 12 Feb 2024 22:32:12 GMT
- Title: Which Pretrain Samples to Rehearse when Finetuning Pretrained Models?
- Authors: Andrew Bai, Chih-Kuan Yeh, Cho-Jui Hsieh, Ankur Taly
- Abstract summary: Fine-tuning pretrained models on specific tasks is now the de facto approach for text and vision tasks.
A known pitfall of this approach is the forgetting of pretraining knowledge that happens during finetuning.
We propose a novel sampling scheme, mix-cd, that identifies and prioritizes samples that actually face forgetting.
- Score: 60.59376487151964
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fine-tuning pretrained foundational models on specific tasks is now the de
facto approach for text and vision tasks. A known pitfall of this approach is
the forgetting of pretraining knowledge that happens during finetuning.
Rehearsing samples randomly from the pretrain dataset is a common approach to
alleviate such forgetting. However, we find that random mixing unintentionally
includes samples which are not (yet) forgotten or unlearnable by the model. We
propose a novel sampling scheme, mix-cd, that identifies and prioritizes
samples that actually face forgetting, which we call collateral damage. Since
directly identifying collateral damage samples is computationally expensive, we
propose a procedure to estimate the distribution of such samples by tracking
the statistics of finetuned samples. Our approach is lightweight, easy to
implement, and can be seamlessly integrated into existing models, offering an
effective means to retain pretrain performance without additional computational
costs.
Related papers
- The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.
UPFT removes the need for labeled data or exhaustive sampling.
Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z) - AMUN: Adversarial Machine UNlearning [13.776549741449557]
Adversarial Machine UNlearning (AMUN) outperforms prior state-of-the-art (SOTA) methods for image classification.
AMUN lowers the confidence of the model on the forget samples by fine-tuning the model on their corresponding adversarial examples.
arXiv Detail & Related papers (2025-03-02T14:36:31Z) - Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities.
We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses.
We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z) - Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.
diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.
We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z) - One-step Noisy Label Mitigation [86.57572253460125]
Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical.
We propose One-step Anti-Noise (OSA), a model-agnostic noisy label mitigation paradigm.
We empirically demonstrate the superiority of OSA, highlighting its enhanced training robustness, improved task transferability, ease of deployment, and reduced computational costs.
arXiv Detail & Related papers (2024-10-02T18:42:56Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting [14.390842560217743]
We propose a novel approach called DistPred for regression and forecasting tasks.
We transform proper scoring rules that measure the discrepancy between the predicted distribution and the target distribution into a differentiable discrete form.
This allows the model to sample numerous samples in a single forward pass to estimate the potential distribution of the response variable.
arXiv Detail & Related papers (2024-06-17T10:33:00Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - Collapsed Inference for Bayesian Deep Learning [36.1725075097107]
We introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples.
A collapsed sample represents uncountably many models drawn from the approximate posterior.
Our proposed use of collapsed samples achieves a balance between scalability and accuracy.
arXiv Detail & Related papers (2023-06-16T08:34:42Z) - Plug-and-Play split Gibbs sampler: embedding deep generative priors in
Bayesian inference [12.91637880428221]
This paper introduces a plug-and-play sampling algorithm that leverages variable splitting to efficiently sample from a posterior distribution.
It divides the challenging task of posterior sampling into two simpler sampling problems.
Its performance is compared to recent state-of-the-art optimization and sampling methods.
arXiv Detail & Related papers (2023-04-21T17:17:51Z) - DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers [21.741026088202126]
We propose a novel way to certify the robustness of pretrained models using only a few training samples.
Our proposed approach generates class-boundary and interpolated samples corresponding to each training sample.
We obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.
arXiv Detail & Related papers (2022-10-17T10:41:18Z) - Forgetting Data from Pre-trained GANs [28.326418377665345]
We investigate how to post-edit a model after training so that it forgets certain kinds of samples.
We provide three different algorithms for GANs that differ on how the samples to be forgotten are described.
Our algorithms are capable of forgetting data while retaining high generation quality at a fraction of the cost of full re-training.
arXiv Detail & Related papers (2022-06-29T03:46:16Z) - Boost Test-Time Performance with Closed-Loop Inference [85.43516360332646]
We propose to predict hard-classified test samples in a looped manner to boost the model performance.
We first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops.
For each hard sample, we construct an additional auxiliary learning task based on its original top-$K$ predictions to calibrate the model.
arXiv Detail & Related papers (2022-03-21T10:20:21Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.