Fast Trainable Projection for Robust Fine-Tuning
- URL: http://arxiv.org/abs/2310.19182v1
- Date: Sun, 29 Oct 2023 22:52:43 GMT
- Title: Fast Trainable Projection for Robust Fine-Tuning
- Authors: Junjiao Tian, Yen-Cheng Liu, James Seale Smith, Zsolt Kira
- Abstract summary: Robust fine-tuning aims to achieve competitive in-distribution (ID) performance.
Projection-based fine-tuning has been successfully used in robust fine-tuning.
Fast Trainable Projection is a new projection-based fine-tuning algorithm.
- Score: 36.51660287722338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust fine-tuning aims to achieve competitive in-distribution (ID)
performance while maintaining the out-of-distribution (OOD) robustness of a
pre-trained model when transferring it to a downstream task. Recently,
projected gradient descent has been successfully used in robust fine-tuning by
constraining the deviation from the initialization of the fine-tuned model
explicitly through projection. However, algorithmically, two limitations
prevent this method from being adopted more widely, scalability and efficiency.
In this paper, we propose a new projection-based fine-tuning algorithm, Fast
Trainable Projection (FTP) for computationally efficient learning of per-layer
projection constraints, resulting in an average $35\%$ speedup on our
benchmarks compared to prior works. FTP can be combined with existing
optimizers such as AdamW, and be used in a plug-and-play fashion. Finally, we
show that FTP is a special instance of hyper-optimizers that tune the
hyper-parameters of optimizers in a learnable manner through nested
differentiation. Empirically, we show superior robustness on OOD datasets,
including domain shifts and natural corruptions, across four different vision
tasks with five different pre-trained models. Additionally, we demonstrate that
FTP is broadly applicable and beneficial to other learning scenarios such as
low-label and continual learning settings thanks to its easy adaptability. The
code will be available at https://github.com/GT-RIPL/FTP.git.
Related papers
- Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Optimization-Free Test-Time Adaptation for Cross-Person Activity
Recognition [30.350005654271868]
Test-Time Adaptation aims to utilize the test stream to adjust predictions in real-time inference.
High computational cost makes it intractable to run on resource-constrained edge devices.
We propose an Optimization-Free Test-Time Adaptation framework for sensor-based HAR.
arXiv Detail & Related papers (2023-10-28T02:20:33Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning [30.251155072822055]
Prototype-based HyperAdapter (PHA) is a novel framework built on the adapter-tuning and hypernetwork.
It introduces an instance-dense retriever and prototypical hypernetwork to generate conditional modules in a sample-efficient manner.
We show that PHA strikes a better trade-off between trainable parameters, accuracy on stream tasks, and sample efficiency.
arXiv Detail & Related papers (2023-10-18T02:42:17Z) - Trainable Projected Gradient Method for Robust Fine-tuning [36.470333094917436]
We propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.
This is motivated by formulating fine-tuning as a bi-level constrained optimization problem.
We show that TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance.
arXiv Detail & Related papers (2023-03-19T17:30:44Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.