SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
- URL: http://arxiv.org/abs/2407.03036v1
- Date: Wed, 3 Jul 2024 11:56:55 GMT
- Title: SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
- Authors: Bac Nguyen, Stefan Uhlich, Fabien Cardinaux, Lukas Mauch, Marzieh Edraki, Aaron Courville,
- Abstract summary: We introduce Sparse Adaptation for Fine-Tuning (SAFT), a method that prevents fine-tuning from forgetting the general knowledge in the pre-trained model.
With only 0.1% of the model parameters, SAFT can significantly improve the performance of CLIP.
On the few-shot learning benchmark of ImageNet, SAFT gives a gain of 5.15% on average over the conventional fine-tuning method in OOD settings.
- Score: 8.566687323389264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse Adaptation for Fine-Tuning (SAFT), a method that prevents fine-tuning from forgetting the general knowledge in the pre-trained model. SAFT only updates a small subset of important parameters whose gradient magnitude is large, while keeping the other parameters frozen. SAFT is straightforward to implement and conceptually simple. Extensive experiments show that with only 0.1% of the model parameters, SAFT can significantly improve the performance of CLIP. It consistently outperforms baseline methods across several benchmarks. On the few-shot learning benchmark of ImageNet and its variants, SAFT gives a gain of 5.15% on average over the conventional fine-tuning method in OOD settings.
Related papers
- FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation [9.28697795097814]
Test-Time Adaptation is essential for enabling deep learning models to handle real-world data distribution shifts.<n>Backpropagation-based methods are not suitable for low-end deployment devices.<n>We propose Forward-Only Zeroth-Order Optimization (FOZO), a novel and practical backpropagation-free paradigm for TTA.
arXiv Detail & Related papers (2026-03-05T02:12:48Z) - Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols [123.73663884421272]
Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.<n>We establish FEWTRANS, a comprehensive benchmark containing 10 diverse datasets.<n>By releasing FEWTRANS, we aim to provide a rigorous "ruler" to streamline reproducible advances in few-shot transfer learning research.
arXiv Detail & Related papers (2026-02-28T05:41:57Z) - Exploring Fine-Tuning for Tabular Foundation Models [3.884856136722027]
This work presents the first comprehensive study of fine-tuning in Tabular Foundation Models (TFMs)<n>We compare Zero-Shot, Meta-Learning, Supervised (SFT), and parameter-efficient (PEFT) approaches, analyzing how dataset factors such as imbalance, size, and dimensionality affect outcomes.<n>Our findings cover performance, calibration, and fairness, offering practical guidelines on when fine-tuning is most beneficial and its limitations.
arXiv Detail & Related papers (2026-01-14T17:40:46Z) - LIFT+: Lightweight Fine-Tuning for Long-Tail Learning [45.187004699024435]
LIFT+ is an innovative lightweight fine-tuning framework to optimize consistent class conditions.
Our framework provides an efficient and accurate pipeline that facilitates fast convergence and model compactness.
arXiv Detail & Related papers (2025-04-17T18:50:47Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained
Models in Few-Shot Learning [21.693779973263172]
In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align)
Our method aims to bolster the model's generalizability by preserving the consistency of spurious features.
Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements.
arXiv Detail & Related papers (2023-10-23T17:12:01Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - Towards Inadequately Pre-trained Models in Transfer Learning [37.66278189011681]
Better ImageNet pre-trained models have been demonstrated to have better transferability to downstream tasks.
In this paper, we found that during the same pre-training process, models at middle epochs, which is inadequately pre-trained, can outperform fully trained models.
Our discoveries suggest that, during pre-training, models tend to first learn spectral components corresponding to large singular values.
arXiv Detail & Related papers (2022-03-09T12:15:55Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.