Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
- URL: http://arxiv.org/abs/2210.11466v3
- Date: Tue, 6 Jun 2023 05:58:11 GMT
- Title: Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
- Authors: Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao,
Percy Liang, Chelsea Finn
- Abstract summary: A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model.
This paper shows that in such settings, selectively fine-tuning a subset of layers matches or outperforms commonly used fine-tuning approaches.
- Score: 114.17184775397067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common approach to transfer learning under distribution shift is to
fine-tune the last few layers of a pre-trained model, preserving learned
features while also adapting to the new task. This paper shows that in such
settings, selectively fine-tuning a subset of layers (which we term surgical
fine-tuning) matches or outperforms commonly used fine-tuning approaches.
Moreover, the type of distribution shift influences which subset is more
effective to tune: for example, for image corruptions, fine-tuning only the
first few layers works best. We validate our findings systematically across
seven real-world data tasks spanning three types of distribution shifts.
Theoretically, we prove that for two-layer neural networks in an idealized
setting, first-layer tuning can outperform fine-tuning all layers. Intuitively,
fine-tuning more parameters on a small target dataset can cause information
learned during pre-training to be forgotten, and the relevant information
depends on the type of shift.
Related papers
- LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views [28.081794908107604]
Fine-tuning is used to leverage the power of pre-trained foundation models in new downstream tasks.
Recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions.
We propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model.
arXiv Detail & Related papers (2024-02-07T08:16:40Z) - Informative regularization for a multi-layer perceptron RR Lyrae
classifier under data shift [3.303002683812084]
We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem.
Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
arXiv Detail & Related papers (2023-03-12T02:49:19Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Less is More: Selective Layer Finetuning with SubTuning [26.43027780266698]
Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance.
In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, keeping the rest of the weights frozen at their initial (pretrained) values.
We demonstrate that emphsubset finetuning (or SubTuning) often achieves accuracy comparable to full finetuning of the model, and even surpasses the performance of full finetuning when training data is scarce.
arXiv Detail & Related papers (2023-02-13T13:38:46Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Improving Self-supervised Learning for Out-of-distribution Task via
Auxiliary Classifier [6.61825491400122]
We observe a strong relationship between rotation prediction (self-supervised) accuracy and semantic classification accuracy on OOD tasks.
We introduce an additional auxiliary classification head in our multi-task network along with semantic classification and rotation prediction head.
Our proposed learning method is framed into bi-level optimisation problem where the upper-level is trained to update the parameters for semantic classification and rotation prediction head.
arXiv Detail & Related papers (2022-09-07T02:00:01Z) - Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced
Data [11.66734752179563]
Classification on long-tailed distributed data is a challenging problem.
Learning on tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task.
We propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning.
arXiv Detail & Related papers (2022-07-22T03:39:51Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Bi-tuning of Pre-trained Representations [79.58542780707441]
Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations.
Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
arXiv Detail & Related papers (2020-11-12T03:32:25Z) - Side-Tuning: A Baseline for Network Adaptation via Additive Side
Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network.
In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.