AdapterFusion: Non-Destructive Task Composition for Transfer Learning
- URL: http://arxiv.org/abs/2005.00247v3
- Date: Tue, 26 Jan 2021 12:54:33 GMT
- Title: AdapterFusion: Non-Destructive Task Composition for Transfer Learning
- Authors: Jonas Pfeiffer, Aishwarya Kamath, Andreas R\"uckl\'e, Kyunghyun Cho,
Iryna Gurevych
- Abstract summary: Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks.
We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks.
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
- Score: 104.9639614787314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential fine-tuning and multi-task learning are methods aiming to
incorporate knowledge from multiple tasks; however, they suffer from
catastrophic forgetting and difficulties in dataset balancing. To address these
shortcomings, we propose AdapterFusion, a new two stage learning algorithm that
leverages knowledge from multiple tasks. First, in the knowledge extraction
stage we learn task specific parameters called adapters, that encapsulate the
task-specific information. We then combine the adapters in a separate knowledge
composition step. We show that by separating the two stages, i.e., knowledge
extraction and knowledge composition, the classifier can effectively exploit
the representations learned from multiple tasks in a non-destructive manner. We
empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it
effectively combines various types of knowledge at different layers of the
model. We show that our approach outperforms traditional strategies such as
full fine-tuning as well as multi-task learning. Our code and adapters are
available at AdapterHub.ml.
Related papers
- Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM.
In stage 1, we extract task-specific features and client-specific features from visual information.
In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z) - Linked Adapters: Linking Past and Future to Present for Effective Continual Learning [3.7166121807265045]
Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks.
Deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task.
We propose a novel approach that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters.
arXiv Detail & Related papers (2024-12-14T05:25:17Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.
We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.
Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - AdapterDistillation: Non-Destructive Task Composition with Knowledge
Distillation [12.648208238878468]
We propose a two-stage knowledge distillation algorithm called AdapterDistillation.
In the first stage, we extract task specific knowledge by using local data to train a student adapter.
In the second stage, we distill the knowledge from the existing teacher adapters into the student adapter to help its inference.
arXiv Detail & Related papers (2023-12-26T07:01:00Z) - GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph [63.81641578763094]
adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs)
We propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge.
In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively.
arXiv Detail & Related papers (2023-09-24T12:56:40Z) - I2I: Initializing Adapters with Improvised Knowledge [15.452979531094567]
Improvise for.
I2LiI, a continual learning algorithm, initializes Adapters for incoming tasks by distilling.
previously-learned tasks.
I2I consistently achieves better task accuracy than independently-trained Adapters.
arXiv Detail & Related papers (2023-04-04T23:51:48Z) - Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks.
Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z) - Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features.
Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z) - K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters [136.75235546149995]
We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa.
Existing methods typically update the original parameters of pre-trained models when injecting knowledge.
We propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.
arXiv Detail & Related papers (2020-02-05T14:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.