ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
- URL: http://arxiv.org/abs/2405.20271v2
- Date: Fri, 11 Oct 2024 12:41:48 GMT
- Title: ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
- Authors: Massimo Bini, Karsten Roth, Zeynep Akata, Anna Khoreva,
- Abstract summary: We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections.
In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
- Score: 59.839926875976225
- License:
- Abstract: Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effective, parameter-efficient, and hyperparameter-robust adaptation, we propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. By design, ETHER transformations require a minimal number of parameters, are less likely to deteriorate model performance, and exhibit robustness to hyperparameter and learning rate choices. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters ($\sim$$10$-$100$ times lower than LoRA or OFT) across multiple image synthesis and natural language tasks without exhaustive hyperparameter tuning. Finally, we investigate the recent emphasis on Hyperspherical Energy retention for adaptation and raise questions on its practical utility. The code is available at https://github.com/mwbini/ether.
Related papers
- LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks.
We propose a novel approach that employs a low rank tensor parametrization for model updates.
Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models [18.877891285367216]
A class of parameter-efficient fine-tuning (PEFT) aims to mitigate computational challenges by selectively fine-tuning only a small fraction of the model parameters.
We introduce $textID3$, a novel selective PEFT method that calculates parameter importance continually and dynamically unmasks parameters.
We analytically show that $textID3$ reduces the number of gradient updates by a factor of two, enhancing computational efficiency.
arXiv Detail & Related papers (2024-08-26T17:58:53Z) - Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - Parameter-Efficient Fine-Tuning With Adapters [5.948206235442328]
This research introduces a novel adaptation method utilizing the UniPELT framework as a base.
Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters.
arXiv Detail & Related papers (2024-05-09T01:40:38Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Advancing Parameter Efficiency in Fine-tuning via Representation Editing [41.81020951061438]
We propose a novel fine-tuning approach for neural models, named Representation EDiting (RED)
RED modifies the representations generated at some layers through the application of scaling and biasing operations.
Remarkably, RED achieves results comparable or superior to both full parameter fine-tuning and other PEFT methods.
arXiv Detail & Related papers (2024-02-23T08:21:02Z) - Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.