Plug-and-Play Knowledge Injection for Pre-trained Language Models
- URL: http://arxiv.org/abs/2305.17691v2
- Date: Mon, 4 Dec 2023 08:33:13 GMT
- Title: Plug-and-Play Knowledge Injection for Pre-trained Language Models
- Authors: Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye,
Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou
- Abstract summary: Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks.
Massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks.
We study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models.
- Score: 116.37916535076478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Injecting external knowledge can improve the performance of pre-trained
language models (PLMs) on various downstream NLP tasks. However, massive
retraining is required to deploy new knowledge injection methods or knowledge
bases for downstream tasks. In this work, we are the first to study how to
improve the flexibility and efficiency of knowledge injection by reusing
existing downstream models. To this end, we explore a new paradigm
plug-and-play knowledge injection, where knowledge bases are injected into
frozen existing downstream models by a knowledge plugin. Correspondingly, we
propose a plug-and-play injection method map-tuning, which trains a mapping of
knowledge embeddings to enrich model inputs with mapped embeddings while
keeping model parameters frozen. Experimental results on three knowledge-driven
NLP tasks show that existing injection methods are not suitable for the new
paradigm, while map-tuning effectively improves the performance of downstream
models. Moreover, we show that a frozen downstream model can be well adapted to
different domains with different mapping networks of domain knowledge. Our code
and models are available at https://github.com/THUNLP/Knowledge-Plugin.
Related papers
- NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge
Distillation [82.85412355714898]
We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models.
Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks.
It explicitly centers knowledge, enabling superior performance for commonsense reasoning.
arXiv Detail & Related papers (2023-12-10T19:45:24Z) - Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM)
The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory.
PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z) - UNTER: A Unified Knowledge Interface for Enhancing Pre-trained Language
Models [100.4659557650775]
We propose a UNified knowledge inTERface, UNTER, to provide a unified perspective to exploit both structured knowledge and unstructured knowledge.
With both forms of knowledge injected, UNTER gains continuous improvements on a series of knowledge-driven NLP tasks.
arXiv Detail & Related papers (2023-05-02T17:33:28Z) - Adversarial Learning Networks: Source-free Unsupervised Domain
Incremental Learning [0.0]
In a non-stationary environment, updating a DNN model requires parameter re-training or model fine-tuning.
We propose an unsupervised source-free method to update DNN classification models.
Unlike existing methods, our approach can update a DNN model incrementally for non-stationary source and target tasks without storing past training data.
arXiv Detail & Related papers (2023-01-28T02:16:13Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z) - Self-Feature Regularization: Self-Feature Distillation Without Teacher
Models [0.0]
Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers.
We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
arXiv Detail & Related papers (2021-03-12T15:29:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.