AdapterFusion: Non-Destructive Task Composition for Transfer Learning
- URL: http://arxiv.org/abs/2005.00247v3
- Date: Tue, 26 Jan 2021 12:54:33 GMT
- Title: AdapterFusion: Non-Destructive Task Composition for Transfer Learning
- Authors: Jonas Pfeiffer, Aishwarya Kamath, Andreas R\"uckl\'e, Kyunghyun Cho,
Iryna Gurevych
- Abstract summary: Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks.
We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks.
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
- Score: 104.9639614787314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential fine-tuning and multi-task learning are methods aiming to
incorporate knowledge from multiple tasks; however, they suffer from
catastrophic forgetting and difficulties in dataset balancing. To address these
shortcomings, we propose AdapterFusion, a new two stage learning algorithm that
leverages knowledge from multiple tasks. First, in the knowledge extraction
stage we learn task specific parameters called adapters, that encapsulate the
task-specific information. We then combine the adapters in a separate knowledge
composition step. We show that by separating the two stages, i.e., knowledge
extraction and knowledge composition, the classifier can effectively exploit
the representations learned from multiple tasks in a non-destructive manner. We
empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it
effectively combines various types of knowledge at different layers of the
model. We show that our approach outperforms traditional strategies such as
full fine-tuning as well as multi-task learning. Our code and adapters are
available at AdapterHub.ml.
Related papers
- ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [12.150065431702055]
We propose a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion.
Our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks.
arXiv Detail & Related papers (2024-10-14T13:29:42Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.
We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.
Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - AdapterDistillation: Non-Destructive Task Composition with Knowledge
Distillation [12.648208238878468]
We propose a two-stage knowledge distillation algorithm called AdapterDistillation.
In the first stage, we extract task specific knowledge by using local data to train a student adapter.
In the second stage, we distill the knowledge from the existing teacher adapters into the student adapter to help its inference.
arXiv Detail & Related papers (2023-12-26T07:01:00Z) - GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph [63.81641578763094]
adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs)
We propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge.
In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively.
arXiv Detail & Related papers (2023-09-24T12:56:40Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - I2I: Initializing Adapters with Improvised Knowledge [15.452979531094567]
Improvise for.
I2LiI, a continual learning algorithm, initializes Adapters for incoming tasks by distilling.
previously-learned tasks.
I2I consistently achieves better task accuracy than independently-trained Adapters.
arXiv Detail & Related papers (2023-04-04T23:51:48Z) - Cross-Task Knowledge Distillation in Multi-Task Recommendation [41.62428191434233]
Multi-task learning has been widely used in real-world recommenders to predict different types of user feedback.
We propose a Cross-Task Knowledge Distillation framework in recommendation, which consists of three procedures.
arXiv Detail & Related papers (2022-02-20T16:15:19Z) - Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks.
Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z) - Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features.
Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z) - K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters [136.75235546149995]
We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa.
Existing methods typically update the original parameters of pre-trained models when injecting knowledge.
We propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.
arXiv Detail & Related papers (2020-02-05T14:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.