Related papers: Modifying Memories in Transformer Models

Modifying Memories in Transformer Models

URL: http://arxiv.org/abs/2012.00363v1
Date: Tue, 1 Dec 2020 09:39:13 GMT
Title: Modifying Memories in Transformer Models
Authors: Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar
Abstract summary: We propose a new task of emphexplicitly modifying specific factual knowledge in Transformer models. This task is useful in many scenarios, such as updating stale knowledge, protecting privacy, and eliminating unintended biases stored in the models.
Score: 71.48657481835767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of parameters. While the tasks of improving the memorization and generalization of Transformers have been widely studied, it is not well known how to make transformers forget specific old facts and memorize new ones. In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}. This task is useful in many scenarios, such as updating stale knowledge, protecting privacy, and eliminating unintended biases stored in the models. We benchmarked several approaches that provide natural baseline performances on this task. This leads to the discovery of key components of a Transformer model that are especially effective for knowledge modifications. The work also provides insights into the role that different training phases (such as pretraining and fine-tuning) play towards memorization and knowledge modification.

Related papers

Introduction to Transformers: an NLP Perspective [59.0241868728732]
We introduce basic concepts of Transformers and present key techniques that form the recent advances of these models. This includes a description of the standard Transformer architecture, a series of model refinements, and common applications.
arXiv Detail & Related papers (2023-11-29T13:51:04Z)
Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z)
Revision Transformers: Instructing Language Models to Change their Values [21.645935518842744]
Current transformer language models (LM) are large-scale models with billions of parameters. We propose the Revision Transformer (RiT) to facilitate easy model updating. The specific combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine makes it possible to update the model's knowledge with little effort and the help of user interaction.
arXiv Detail & Related papers (2022-10-19T07:05:06Z)
Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling [69.31802246621963]
We propose a novel memory-augmented transformer that is compatible with existing pre-trained encoder-decoder models. By incorporating a separate memory module alongside the pre-trained transformer, the model can effectively interchange information between the memory states and the current input context.
arXiv Detail & Related papers (2022-09-15T22:37:22Z)
Neural Knowledge Bank for Pretrained Transformers [20.416700112895974]
We propose a Neural Knowledge Bank (NKB) to store extra factual knowledge for pretrained Transformers. During knowledge injection, we fix the original model and inject factual knowledge into the extended memory slots. We use three closed-book question answering datasets to show our strong ability to store extra factual knowledge.
arXiv Detail & Related papers (2022-07-31T09:14:34Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
Efficient Transformers: A Survey [98.23264445730645]
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. This paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models.
arXiv Detail & Related papers (2020-09-14T20:38:14Z)
Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture. We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing. Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.