Trainable Projected Gradient Method for Robust Fine-tuning
- URL: http://arxiv.org/abs/2303.10720v2
- Date: Tue, 28 Mar 2023 15:04:36 GMT
- Title: Trainable Projected Gradient Method for Robust Fine-tuning
- Authors: Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu,
Zsolt Kira
- Abstract summary: We propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.
This is motivated by formulating fine-tuning as a bi-level constrained optimization problem.
We show that TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance.
- Score: 36.470333094917436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies on transfer learning have shown that selectively fine-tuning a
subset of layers or customizing different learning rates for each layer can
greatly improve robustness to out-of-distribution (OOD) data and retain
generalization capability in the pre-trained models. However, most of these
methods employ manually crafted heuristics or expensive hyper-parameter
searches, which prevent them from scaling up to large datasets and neural
networks. To solve this problem, we propose Trainable Projected Gradient Method
(TPGM) to automatically learn the constraint imposed for each layer for a
fine-grained fine-tuning regularization. This is motivated by formulating
fine-tuning as a bi-level constrained optimization problem. Specifically, TPGM
maintains a set of projection radii, i.e., distance constraints between the
fine-tuned model and the pre-trained model, for each layer, and enforces them
through weight projections. To learn the constraints, we propose a bi-level
optimization to automatically learn the best set of projection radii in an
end-to-end manner. Theoretically, we show that the bi-level optimization
formulation could explain the regularization capability of TPGM. Empirically,
with little hyper-parameter search cost, TPGM outperforms existing fine-tuning
methods in OOD performance while matching the best in-distribution (ID)
performance. For example, when fine-tuned on DomainNet-Real and ImageNet,
compared to vanilla fine-tuning, TPGM shows $22\%$ and $10\%$ relative OOD
improvement respectively on their sketch counterparts. Code is available at
\url{https://github.com/PotatoTian/TPGM}.
Related papers
- PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization [35.922096876707975]
PACE is a generalization of PArameter-efficient fine-tuning with Consistency rEgularization.
We show that PACE implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge.
PACE outperforms existing PEFT methods in four visual adaptation tasks: VTAB-1k, FGVC, few-shot learning and domain adaptation.
arXiv Detail & Related papers (2024-09-25T17:56:00Z) - Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting [0.0]
We try to push the understanding of different fine-tuning strategies for large language models (LLMs)
We compare state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI.
Our findings suggest that these alternative strategies can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT.
arXiv Detail & Related papers (2024-05-21T20:08:52Z) - AutoFT: Learning an Objective for Robust Fine-Tuning [60.641186718253735]
Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning.
Current approaches to robust fine-tuning use hand-crafted regularization techniques.
We propose AutoFT, a data-driven approach for robust fine-tuning.
arXiv Detail & Related papers (2024-01-18T18:58:49Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - Fast Trainable Projection for Robust Fine-Tuning [36.51660287722338]
Robust fine-tuning aims to achieve competitive in-distribution (ID) performance.
Projection-based fine-tuning has been successfully used in robust fine-tuning.
Fast Trainable Projection is a new projection-based fine-tuning algorithm.
arXiv Detail & Related papers (2023-10-29T22:52:43Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent
for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.
We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Adapting by Pruning: A Case Study on BERT [9.963251767416967]
We propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task.
We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model.
Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model.
arXiv Detail & Related papers (2021-05-07T15:51:08Z) - Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation.
We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.