Related papers: Trainable Projected Gradient Method for Robust Fine-tuning

Trainable Projected Gradient Method for Robust Fine-tuning

URL: http://arxiv.org/abs/2303.10720v2
Date: Tue, 28 Mar 2023 15:04:36 GMT
Title: Trainable Projected Gradient Method for Robust Fine-tuning
Authors: Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira
Abstract summary: We propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. We show that TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance.
Score: 36.470333094917436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization capability in the pre-trained models. However, most of these methods employ manually crafted heuristics or expensive hyper-parameter searches, which prevent them from scaling up to large datasets and neural networks. To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. Specifically, TPGM maintains a set of projection radii, i.e., distance constraints between the fine-tuned model and the pre-trained model, for each layer, and enforces them through weight projections. To learn the constraints, we propose a bi-level optimization to automatically learn the best set of projection radii in an end-to-end manner. Theoretically, we show that the bi-level optimization formulation could explain the regularization capability of TPGM. Empirically, with little hyper-parameter search cost, TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance. For example, when fine-tuned on DomainNet-Real and ImageNet, compared to vanilla fine-tuning, TPGM shows $22\%$ and $10\%$ relative OOD improvement respectively on their sketch counterparts. Code is available at \url{https://github.com/PotatoTian/TPGM}.

Related papers

Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We propose a new family of algorithms that uses the linear minimization oracle (LMO) to adapt to the geometry of the problem.<n>We demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers [18.921532965557475]
Test-time adaptation aims to fine-tune a trained model online using unlabeled testing data. In this optimization process, unsupervised learning objectives like entropy frequently encounter noisy learning signals. We employ a learning-to-optimize approach to automatically learn an entropy generator called Meta Gradient Generator.
arXiv Detail & Related papers (2024-12-22T07:24:09Z)
PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization [35.922096876707975]
PACE is a generalization of PArameter-efficient fine-tuning with Consistency rEgularization. We show that PACE implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge. PACE outperforms existing PEFT methods in four visual adaptation tasks: VTAB-1k, FGVC, few-shot learning and domain adaptation.
arXiv Detail & Related papers (2024-09-25T17:56:00Z)
Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting [0.0]
We try to push the understanding of different fine-tuning strategies for large language models (LLMs) We compare state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI. Our findings suggest that these alternative strategies can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT.
arXiv Detail & Related papers (2024-05-21T20:08:52Z)
AutoFT: Learning an Objective for Robust Fine-Tuning [60.641186718253735]
Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning. Current approaches to robust fine-tuning use hand-crafted regularization techniques. We propose AutoFT, a data-driven approach for robust fine-tuning.
arXiv Detail & Related papers (2024-01-18T18:58:49Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
Fast Trainable Projection for Robust Fine-Tuning [36.51660287722338]
Robust fine-tuning aims to achieve competitive in-distribution (ID) performance. Projection-based fine-tuning has been successfully used in robust fine-tuning. Fast Trainable Projection is a new projection-based fine-tuning algorithm.
arXiv Detail & Related papers (2023-10-29T22:52:43Z)
Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes. We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z)
BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models. We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
Adapting by Pruning: A Case Study on BERT [9.963251767416967]
We propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task. We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model. Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model.
arXiv Detail & Related papers (2021-05-07T15:51:08Z)
Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation. We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.