AdapterDistillation: Non-Destructive Task Composition with Knowledge
Distillation
- URL: http://arxiv.org/abs/2312.16261v1
- Date: Tue, 26 Dec 2023 07:01:00 GMT
- Title: AdapterDistillation: Non-Destructive Task Composition with Knowledge
Distillation
- Authors: Junjie Wang, Yicheng Chen, Wangshu Zhang, Sen Hu, Teng Xu, Jing Zheng
- Abstract summary: We propose a two-stage knowledge distillation algorithm called AdapterDistillation.
In the first stage, we extract task specific knowledge by using local data to train a student adapter.
In the second stage, we distill the knowledge from the existing teacher adapters into the student adapter to help its inference.
- Score: 12.648208238878468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leveraging knowledge from multiple tasks through introducing a small number
of task specific parameters into each transformer layer, also known as
adapters, receives much attention recently. However, adding an extra fusion
layer to implement knowledge composition not only increases the inference time
but also is non-scalable for some applications. To avoid these issues, we
propose a two-stage knowledge distillation algorithm called
AdapterDistillation. In the first stage, we extract task specific knowledge by
using local data to train a student adapter. In the second stage, we distill
the knowledge from the existing teacher adapters into the student adapter to
help its inference. Extensive experiments on frequently asked question
retrieval in task-oriented dialog systems validate the efficiency of
AdapterDistillation. We show that AdapterDistillation outperforms existing
algorithms in terms of accuracy, resource consumption and inference time.
Related papers
- CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.
We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.
Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - Auto-selected Knowledge Adapters for Lifelong Person Re-identification [54.42307214981537]
Lifelong Person Re-Identification requires systems to continually learn from non-overlapping datasets across different times and locations.
Existing approaches, either rehearsal-free or rehearsal-based, still suffer from the problem of catastrophic forgetting.
We introduce a novel framework AdalReID, that adopts knowledge adapters and a parameter-free auto-selection mechanism for lifelong learning.
arXiv Detail & Related papers (2024-05-29T11:42:02Z) - TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic
Scene Understanding [38.40969494998194]
We propose a Task-Specific Prompts Transformer, dubbed TSP-Transformer, for holistic scene understanding.
It features a vanilla transformer in the early stage and tasks-specific prompts transformer encoder in the lateral stage, where tasks-specific prompts are augmented.
Experiments on NYUD-v2 and PASCAL-Context show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-06T18:20:02Z) - Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities.
Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner.
In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z) - I2I: Initializing Adapters with Improvised Knowledge [15.452979531094567]
Improvise for.
I2LiI, a continual learning algorithm, initializes Adapters for incoming tasks by distilling.
previously-learned tasks.
I2I consistently achieves better task accuracy than independently-trained Adapters.
arXiv Detail & Related papers (2023-04-04T23:51:48Z) - Cross-Task Knowledge Distillation in Multi-Task Recommendation [41.62428191434233]
Multi-task learning has been widely used in real-world recommenders to predict different types of user feedback.
We propose a Cross-Task Knowledge Distillation framework in recommendation, which consists of three procedures.
arXiv Detail & Related papers (2022-02-20T16:15:19Z) - Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene
Text Recognition [60.36540008537054]
In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost.
We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks.
Experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks.
arXiv Detail & Related papers (2021-05-13T12:27:35Z) - AdapterFusion: Non-Destructive Task Composition for Transfer Learning [104.9639614787314]
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks.
We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks.
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
arXiv Detail & Related papers (2020-05-01T07:03:42Z) - K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters [136.75235546149995]
We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa.
Existing methods typically update the original parameters of pre-trained models when injecting knowledge.
We propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.
arXiv Detail & Related papers (2020-02-05T14:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.