K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
- URL: http://arxiv.org/abs/2002.01808v5
- Date: Mon, 28 Dec 2020 06:07:06 GMT
- Title: K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
- Authors: Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu
ji, Guihong Cao, Daxin Jiang, Ming Zhou
- Abstract summary: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa.
Existing methods typically update the original parameters of pre-trained models when injecting knowledge.
We propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.
- Score: 136.75235546149995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of injecting knowledge into large pre-trained models
like BERT and RoBERTa. Existing methods typically update the original
parameters of pre-trained models when injecting knowledge. However, when
multiple kinds of knowledge are injected, the historically injected knowledge
would be flushed away. To address this, we propose K-Adapter, a framework that
retains the original parameters of the pre-trained model fixed and supports the
development of versatile knowledge-infused model. Taking RoBERTa as the
backbone model, K-Adapter has a neural adapter for each kind of infused
knowledge, like a plug-in connected to RoBERTa. There is no information flow
between different adapters, thus multiple adapters can be efficiently trained
in a distributed way. As a case study, we inject two kinds of knowledge in this
work, including (1) factual knowledge obtained from automatically aligned
text-triplets on Wikipedia and Wikidata and (2) linguistic knowledge obtained
via dependency parsing. Results on three knowledge-driven tasks, including
relation classification, entity typing, and question answering, demonstrate
that each adapter improves the performance and the combination of both adapters
brings further improvements. Further analysis indicates that K-Adapter captures
versatile knowledge than RoBERTa.
Related papers
- UniAdapt: A Universal Adapter for Knowledge Calibration [5.732271982985626]
Large Language Models (LLMs) require frequent updates to correct errors and keep pace with continuously evolving knowledge.
Recent research in it model editing has highlighted the challenges in balancing generalization and locality.
We introduce UniAdapt, a universal adapter for knowledge calibration.
arXiv Detail & Related papers (2024-10-01T07:18:34Z) - Auto-selected Knowledge Adapters for Lifelong Person Re-identification [54.42307214981537]
Lifelong Person Re-Identification requires systems to continually learn from non-overlapping datasets across different times and locations.
Existing approaches, either rehearsal-free or rehearsal-based, still suffer from the problem of catastrophic forgetting.
We introduce a novel framework AdalReID, that adopts knowledge adapters and a parameter-free auto-selection mechanism for lifelong learning.
arXiv Detail & Related papers (2024-05-29T11:42:02Z) - AdapterDistillation: Non-Destructive Task Composition with Knowledge
Distillation [12.648208238878468]
We propose a two-stage knowledge distillation algorithm called AdapterDistillation.
In the first stage, we extract task specific knowledge by using local data to train a student adapter.
In the second stage, we distill the knowledge from the existing teacher adapters into the student adapter to help its inference.
arXiv Detail & Related papers (2023-12-26T07:01:00Z) - Plug-and-Play Knowledge Injection for Pre-trained Language Models [116.37916535076478]
Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks.
Massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks.
We study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models.
arXiv Detail & Related papers (2023-05-28T10:58:00Z) - Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM)
The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory.
PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - AdapterFusion: Non-Destructive Task Composition for Transfer Learning [104.9639614787314]
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks.
We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks.
We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
arXiv Detail & Related papers (2020-05-01T07:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.