Related papers: AdapterFusion: Non-Destructive Task Composition for Transfer Learning

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

URL: http://arxiv.org/abs/2005.00247v3
Date: Tue, 26 Jan 2021 12:54:33 GMT
Title: AdapterFusion: Non-Destructive Task Composition for Transfer Learning
Authors: Jonas Pfeiffer, Aishwarya Kamath, Andreas R\"uckl\'e, Kyunghyun Cho, Iryna Gurevych
Abstract summary: Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks. We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
Score: 104.9639614787314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. First, in the knowledge extraction stage we learn task specific parameters called adapters, that encapsulate the task-specific information. We then combine the adapters in a separate knowledge composition step. We show that by separating the two stages, i.e., knowledge extraction and knowledge composition, the classifier can effectively exploit the representations learned from multiple tasks in a non-destructive manner. We empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it effectively combines various types of knowledge at different layers of the model. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning. Our code and adapters are available at AdapterHub.ml.

Related papers

Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM. In stage 1, we extract task-specific features and client-specific features from visual information. In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z)
Linked Adapters: Linking Past and Future to Present for Effective Continual Learning [3.7166121807265045]
Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks. Deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task. We propose a novel approach that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters.
arXiv Detail & Related papers (2024-12-14T05:25:17Z)
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [12.150065431702055]
We propose a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion. Our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks.
arXiv Detail & Related papers (2024-10-14T13:29:42Z)
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain. We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters. Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z)
AdapterDistillation: Non-Destructive Task Composition with Knowledge Distillation [12.648208238878468]
We propose a two-stage knowledge distillation algorithm called AdapterDistillation. In the first stage, we extract task specific knowledge by using local data to train a student adapter. In the second stage, we distill the knowledge from the existing teacher adapters into the student adapter to help its inference.
arXiv Detail & Related papers (2023-12-26T07:01:00Z)
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph [63.81641578763094]
adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) We propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively.
arXiv Detail & Related papers (2023-09-24T12:56:40Z)
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks. We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z)
I2I: Initializing Adapters with Improvised Knowledge [15.452979531094567]
Improvise for. I2LiI, a continual learning algorithm, initializes Adapters for incoming tasks by distilling. previously-learned tasks. I2I consistently achieves better task accuracy than independently-trained Adapters.
arXiv Detail & Related papers (2023-04-04T23:51:48Z)
Cross-Task Knowledge Distillation in Multi-Task Recommendation [41.62428191434233]
Multi-task learning has been widely used in real-world recommenders to predict different types of user feedback. We propose a Cross-Task Knowledge Distillation framework in recommendation, which consists of three procedures.
arXiv Detail & Related papers (2022-02-20T16:15:19Z)
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z)
Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features. Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z)
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters [136.75235546149995]
We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. We propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.
arXiv Detail & Related papers (2020-02-05T14:30:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.