Empowering parameter-efficient transfer learning by recognizing the
kernel structure in self-attention
- URL: http://arxiv.org/abs/2205.03720v1
- Date: Sat, 7 May 2022 20:52:54 GMT
- Title: Empowering parameter-efficient transfer learning by recognizing the
kernel structure in self-attention
- Authors: Yifan Chen, Devamanyu Hazarika, Mahdi Namazifar, Yang Liu, Di Jin,
Dilek Hakkani-Tur
- Abstract summary: We propose adapters that utilize the kernel structure in self-attention to guide the assignment of tunable parameters.
Our results show that our proposed adapters can attain or improve the strong performance of existing baselines.
- Score: 53.72897232951918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The massive amount of trainable parameters in the pre-trained language models
(PLMs) makes them hard to be deployed to multiple downstream tasks. To address
this issue, parameter-efficient transfer learning methods have been proposed to
tune only a few parameters during fine-tuning while freezing the rest. This
paper looks at existing methods along this line through the \textit{kernel
lens}. Motivated by the connection between self-attention in transformer-based
PLMs and kernel learning, we propose \textit{kernel-wise adapters}, namely
\textit{Kernel-mix}, that utilize the kernel structure in self-attention to
guide the assignment of the tunable parameters. These adapters use guidelines
found in classical kernel learning and enable separate parameter tuning for
each attention head. Our empirical results, over a diverse set of natural
language generation and understanding tasks, show that our proposed adapters
can attain or improve the strong performance of existing baselines.
Related papers
- Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting.
We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z) - Towards Infinite-Long Prefix in Transformer [18.24137806007111]
We study the ability of Prompting and context-based fine-tuning methods to match the performance of full parameter fine-tuning.
We implement an algorithm that only needs to introduce and fine-tune a few extra trainable parameters instead of an infinite-long prefix.
Our method achieves superior or competitive performance compared to existing methods like full parameters fine-tuning, P-Tuning V2, and LoRA.
arXiv Detail & Related papers (2024-06-20T06:56:35Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
Benchmark for Speech Understanding [40.27182770995891]
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models.
We introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks.
arXiv Detail & Related papers (2023-03-02T08:57:33Z) - Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning [53.72897232951918]
We show that inducer-tuning can close the performance gap between prefix-tuning and fine-tuning.
We suggest a new variant of prefix-tuning -- textitinducer-tuning, which shares the exact mechanism as prefix-tuning while leveraging the residual form found in adapter-tuning.
arXiv Detail & Related papers (2022-10-26T04:39:42Z) - Prompt-Matched Semantic Segmentation [96.99924127527002]
The objective of this work is to explore how to effectively adapt pre-trained foundation models to various downstream tasks of image semantic segmentation.
We propose a novel Inter-Stage Prompt-Matched Framework, which maintains the original structure of the foundation model while generating visual prompts adaptively for task-oriented tuning.
A lightweight module termed Semantic-aware Prompt Matcher is then introduced to hierarchically interpolate between two stages to learn reasonable prompts for each specific task.
arXiv Detail & Related papers (2022-08-22T09:12:53Z) - On the infinite width limit of neural networks with a standard
parameterization [52.07828272324366]
We propose an improved extrapolation of the standard parameterization that preserves all of these properties as width is taken to infinity.
We show experimentally that the resulting kernels typically achieve similar accuracy to those resulting from an NTK parameterization.
arXiv Detail & Related papers (2020-01-21T01:02:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.