AutoFT: Learning an Objective for Robust Fine-Tuning
- URL: http://arxiv.org/abs/2401.10220v2
- Date: Thu, 7 Mar 2024 08:49:26 GMT
- Title: AutoFT: Learning an Objective for Robust Fine-Tuning
- Authors: Caroline Choi, Yoonho Lee, Annie Chen, Allan Zhou, Aditi Raghunathan,
Chelsea Finn
- Abstract summary: Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning.
Current approaches to robust fine-tuning use hand-crafted regularization techniques.
We propose AutoFT, a data-driven approach for robust fine-tuning.
- Score: 60.641186718253735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models encode rich representations that can be adapted to
downstream tasks by fine-tuning. However, fine-tuning a model on one data
distribution often degrades performance under distribution shifts. Current
approaches to robust fine-tuning use hand-crafted regularization techniques to
constrain the fine-tuning process towards the pretrained model. Yet, it is hard
to specify how to adapt relevant characteristics of the foundation model during
fine-tuning, as this depends on how the pre-training, fine-tuning, and test
data distributions relate to each other. We propose AutoFT, a data-driven
approach for robust fine-tuning. Given a task, AutoFT searches for a
fine-tuning procedure that enhances out-of-distribution (OOD) generalization.
Specifically, AutoFT uses bi-level optimization to search for an objective
function and hyperparameters that maximize post-adaptation performance on a
small OOD validation set. We evaluate AutoFT on nine natural distribution
shifts. Our experiments show that AutoFT significantly improves generalization
to OOD inputs, outperforming existing robust fine-tuning methods. Notably,
AutoFT achieves a new state-of-the-art on the WILDS iWildCam and FMoW
benchmarks, outperforming the previous best methods by $6.0\%$ and $1.5\%$,
respectively.
Related papers
- A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models [32.178931149612644]
Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks.
Existing work, such as parameter-efficient finetuning (PEFT), often focuses on textithow to finetune but neglects the issue of textitwhere to finetune
arXiv Detail & Related papers (2024-06-17T17:13:08Z) - Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting [0.0]
We try to push the understanding of different fine-tuning strategies for large language models (LLMs)
We compare state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI.
Our findings suggest that these alternative strategies can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT.
arXiv Detail & Related papers (2024-05-21T20:08:52Z) - Empirical Analysis of Efficient Fine-Tuning Methods for Large
Pre-Trained Language Models [4.096453902709292]
BitFit and adapter modules are compared to standard full model fine-tuning.
The BitFit approach matches full fine-tuning performance across varying amounts of training data.
adapter modules exhibit high variability, with inconsistent gains over default models.
arXiv Detail & Related papers (2024-01-08T17:44:43Z) - FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics [7.58472343957521]
We show that training dynamics are highly transferable across model sizes and pre-training methods.
We propose a novel fine-tuning approach: Fine-Tuning by transFerring Training dynamics (FTFT)
arXiv Detail & Related papers (2023-10-10T12:53:48Z) - Trainable Projected Gradient Method for Robust Fine-tuning [36.470333094917436]
We propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.
This is motivated by formulating fine-tuning as a bi-level constrained optimization problem.
We show that TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance.
arXiv Detail & Related papers (2023-03-19T17:30:44Z) - Scaling & Shifting Your Features: A New Baseline for Efficient Model
Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing)
We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z) - Fine-Tuning can Distort Pretrained Features and Underperform
Out-of-Distribution [100.01469697743322]
Fine-tuning can achieve worse accuracy than linear probing when the pretrained features are good and the distribution shift is large.
We show theoretically that this tradeoff between ID and OOD accuracy arises even in a simple setting.
Our analysis suggests that the easy two-step strategy of linear probing then full fine-tuning combines the benefits of both fine-tuning and linear probing.
arXiv Detail & Related papers (2022-02-21T09:03:34Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - AutoFlow: Learning a Better Training Set for Optical Flow [62.40293188964933]
AutoFlow is a method to render training data for optical flow.
AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT.
arXiv Detail & Related papers (2021-04-29T17:55:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.