Related papers: K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

URL: http://arxiv.org/abs/2002.01808v5
Date: Mon, 28 Dec 2020 06:07:06 GMT
Title: K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Authors: Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu ji, Guihong Cao, Daxin Jiang, Ming Zhou
Abstract summary: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. We propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model.
Score: 136.75235546149995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, the historically injected knowledge would be flushed away. To address this, we propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model. Taking RoBERTa as the backbone model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus multiple adapters can be efficiently trained in a distributed way. As a case study, we inject two kinds of knowledge in this work, including (1) factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata and (2) linguistic knowledge obtained via dependency parsing. Results on three knowledge-driven tasks, including relation classification, entity typing, and question answering, demonstrate that each adapter improves the performance and the combination of both adapters brings further improvements. Further analysis indicates that K-Adapter captures versatile knowledge than RoBERTa.

Related papers

UniAdapt: A Universal Adapter for Knowledge Calibration [5.732271982985626]
Large Language Models (LLMs) require frequent updates to correct errors and keep pace with continuously evolving knowledge. Recent research in it model editing has highlighted the challenges in balancing generalization and locality. We introduce UniAdapt, a universal adapter for knowledge calibration.
arXiv Detail & Related papers (2024-10-01T07:18:34Z)
Auto-selected Knowledge Adapters for Lifelong Person Re-identification [54.42307214981537]
Lifelong Person Re-Identification requires systems to continually learn from non-overlapping datasets across different times and locations. Existing approaches, either rehearsal-free or rehearsal-based, still suffer from the problem of catastrophic forgetting. We introduce a novel framework AdalReID, that adopts knowledge adapters and a parameter-free auto-selection mechanism for lifelong learning.
arXiv Detail & Related papers (2024-05-29T11:42:02Z)
AdapterDistillation: Non-Destructive Task Composition with Knowledge Distillation [12.648208238878468]
We propose a two-stage knowledge distillation algorithm called AdapterDistillation. In the first stage, we extract task specific knowledge by using local data to train a student adapter. In the second stage, we distill the knowledge from the existing teacher adapters into the student adapter to help its inference.
arXiv Detail & Related papers (2023-12-26T07:01:00Z)
Plug-and-Play Knowledge Injection for Pre-trained Language Models [116.37916535076478]
Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. Massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. We study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models.
arXiv Detail & Related papers (2023-05-28T10:58:00Z)
Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM) The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory. PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z)
Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer. We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z)
AdapterFusion: Non-Destructive Task Composition for Transfer Learning [104.9639614787314]
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks. We propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning.
arXiv Detail & Related papers (2020-05-01T07:03:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.