Neural Knowledge Bank for Pretrained Transformers
- URL: http://arxiv.org/abs/2208.00399v1
- Date: Sun, 31 Jul 2022 09:14:34 GMT
- Title: Neural Knowledge Bank for Pretrained Transformers
- Authors: Damai Dai, Wenbin Jiang, Qingxiu Dong, Yajuan Lyu, Qiaoqiao She,
Zhifang Sui
- Abstract summary: We propose a Neural Knowledge Bank (NKB) to store extra factual knowledge for pretrained Transformers.
During knowledge injection, we fix the original model and inject factual knowledge into the extended memory slots.
We use three closed-book question answering datasets to show our strong ability to store extra factual knowledge.
- Score: 20.416700112895974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability of pretrained Transformers to remember factual knowledge is
essential for knowledge-intense downstream tasks such as closed-book question
answering. Existing work has shown that pretrained Transformers can recall or
leverage factual knowledge that appears in the pretraining corpus to some
degree. However, due to the limit of the model capacity, the ability of
pretrained models to remember factual knowledge is also limited. Dai et al.
(2022) find that the Feed-Forward Networks (FFNs) in pretrained Transformers
store factual knowledge in a memory-like manner. Inspired by this finding, we
propose a Neural Knowledge Bank (NKB) to store extra factual knowledge for
pretrained Transformers. To be specific, we also regard FFNs as key-value
memories, and extend them with additional memory slots. During knowledge
injection, we fix the original model and inject factual knowledge into the
extended memory slots, so there will be no catastrophic forgetting for the
pretrained model. In addition, the view of FFNs as key-value memories makes the
NKB highly interpretable. We use three closed-book question answering datasets
to show our strong ability to store extra factual knowledge. Also, we prove
that the NKB will not degrade the general language generation ability of
pretrained models through two representative generation tasks, summarization
and machine translation. Further, we thoroughly analyze the NKB to reveal its
working mechanism and present the meaning of its keys and values in a
human-readable way. On top of it, we perform a preliminary attempt to directly
update the factual knowledge in the NKB without any additional training.
Related papers
- Large Scale Knowledge Washing [24.533316191149677]
Large language models show impressive abilities in memorizing world knowledge.
We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge.
arXiv Detail & Related papers (2024-05-26T23:29:49Z) - Decouple knowledge from parameters for plug-and-play language modeling [77.5601135412186]
We introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM)
The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory.
PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training.
arXiv Detail & Related papers (2023-05-19T10:01:55Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - Knowledge Neurons in Pretrained Transformers [45.24499368763417]
In this paper, we explore how implicit knowledge is stored in pretrained Transformers.
We propose a knowledge attribution method to identify the neurons that express the fact.
We show that the activation of such knowledge neurons is highly correlated to the expression of their corresponding facts.
arXiv Detail & Related papers (2021-04-18T03:38:26Z) - Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge.
Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training.
We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z) - Modifying Memories in Transformer Models [71.48657481835767]
We propose a new task of emphexplicitly modifying specific factual knowledge in Transformer models.
This task is useful in many scenarios, such as updating stale knowledge, protecting privacy, and eliminating unintended biases stored in the models.
arXiv Detail & Related papers (2020-12-01T09:39:13Z) - Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture.
We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.
Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z) - Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus.
Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.