AutoFT: Learning an Objective for Robust Fine-Tuning
- URL: http://arxiv.org/abs/2401.10220v2
- Date: Thu, 7 Mar 2024 08:49:26 GMT
- Title: AutoFT: Learning an Objective for Robust Fine-Tuning
- Authors: Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan,
Chelsea Finn
- Abstract summary: Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning.
Current approaches to robust fine-tuning use hand-crafted regularization techniques.
We propose AutoFT, a data-driven approach for robust fine-tuning.
- Score: 60.641186718253735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models encode rich representations that can be adapted to
downstream tasks by fine-tuning. However, fine-tuning a model on one data
distribution often degrades performance under distribution shifts. Current
approaches to robust fine-tuning use hand-crafted regularization techniques to
constrain the fine-tuning process towards the pretrained model. Yet, it is hard
to specify how to adapt relevant characteristics of the foundation model during
fine-tuning, as this depends on how the pre-training, fine-tuning, and test
data distributions relate to each other. We propose AutoFT, a data-driven
approach for robust fine-tuning. Given a task, AutoFT searches for a
fine-tuning procedure that enhances out-of-distribution (OOD) generalization.
Specifically, AutoFT uses bi-level optimization to search for an objective
function and hyperparameters that maximize post-adaptation performance on a
small OOD validation set. We evaluate AutoFT on nine natural distribution
shifts. Our experiments show that AutoFT significantly improves generalization
to OOD inputs, outperforming existing robust fine-tuning methods. Notably,
AutoFT achieves a new state-of-the-art on the WILDS iWildCam and FMoW
benchmarks, outperforming the previous best methods by $6.0\%$ and $1.5\%$,
respectively.
Related papers
- Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models [32.178931149612644]
Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks.
Existing work, such as parameter-efficient finetuning (PEFT), often focuses on textithow to finetune but neglects the issue of textitwhere to finetune
arXiv Detail & Related papers (2024-06-17T17:13:08Z) - Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting [0.0]
We try to push the understanding of different fine-tuning strategies for large language models (LLMs)
We compare state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI.
Our findings suggest that these alternative strategies can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT.
arXiv Detail & Related papers (2024-05-21T20:08:52Z) - Empirical Analysis of Efficient Fine-Tuning Methods for Large
Pre-Trained Language Models [4.096453902709292]
BitFit and adapter modules are compared to standard full model fine-tuning.
The BitFit approach matches full fine-tuning performance across varying amounts of training data.
adapter modules exhibit high variability, with inconsistent gains over default models.
arXiv Detail & Related papers (2024-01-08T17:44:43Z) - FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics [7.58472343957521]
We show that training dynamics are highly transferable across model sizes and pre-training methods.
We propose a novel fine-tuning approach: Fine-Tuning by transFerring Training dynamics (FTFT)
arXiv Detail & Related papers (2023-10-10T12:53:48Z) - Trainable Projected Gradient Method for Robust Fine-tuning [36.470333094917436]
We propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.
This is motivated by formulating fine-tuning as a bi-level constrained optimization problem.
We show that TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance.
arXiv Detail & Related papers (2023-03-19T17:30:44Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - AutoFlow: Learning a Better Training Set for Optical Flow [62.40293188964933]
AutoFlow is a method to render training data for optical flow.
AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT.
arXiv Detail & Related papers (2021-04-29T17:55:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.