Mention Memory: incorporating textual knowledge into Transformers
through entity mention attention
- URL: http://arxiv.org/abs/2110.06176v1
- Date: Tue, 12 Oct 2021 17:19:05 GMT
- Title: Mention Memory: incorporating textual knowledge into Transformers
through entity mention attention
- Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha,
William Cohen
- Abstract summary: We propose to integrate a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge.
The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory.
In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks.
- Score: 21.361822569279003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language understanding tasks such as open-domain question answering
often require retrieving and assimilating factual information from multiple
sources. We propose to address this problem by integrating a semi-parametric
representation of a large text corpus into a Transformer model as a source of
factual knowledge. Specifically, our method represents knowledge with `mention
memory', a table of dense vector representations of every entity mention in a
corpus. The proposed model - TOME - is a Transformer that accesses the
information through internal memory layers in which each entity mention in the
input passage attends to the mention memory. This approach enables synthesis of
and reasoning over many disparate sources of information within a single
Transformer model. In experiments using a memory of 150 million Wikipedia
mentions, TOME achieves strong performance on several open-domain
knowledge-intensive tasks, including the claim verification benchmarks HoVer
and FEVER and several entity-based QA benchmarks. We also show that the model
learns to attend to informative mentions without any direct supervision.
Finally we demonstrate that the model can generalize to new unseen entities by
updating the memory without retraining.
Related papers
- In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference.
This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z) - MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources [12.783393023641505]
We introduce an efficient memory-augmented transformer called MATTER.
MATTER retrieves and reads from both unstructured sources (paragraphs) and semi-structured sources (QA pairs) in the form of fixed-length neural memories.
We demonstrate that our model outperforms existing efficient retrieval-augmented models on popular QA benchmarks in terms of both accuracy and speed.
arXiv Detail & Related papers (2024-06-07T06:35:37Z) - MEMORYLLM: Towards Self-Updatable Large Language Models [101.3777486749529]
Existing Large Language Models (LLMs) usually remain static after deployment.
We introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memory pool.
MEMORYLLM can self-update with text knowledge and memorize the knowledge injected earlier.
arXiv Detail & Related papers (2024-02-07T07:14:11Z) - REVEAL: Retrieval-Augmented Visual-Language Pre-Training with
Multi-Source Multimodal Knowledge Memory [119.98011559193574]
We propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL)
It learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries.
A key novelty in our approach is that the memory, encoder, retriever and generator are all pre-trained end-to-end on a massive amount of data.
arXiv Detail & Related papers (2022-12-10T06:17:56Z) - An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP
Tasks [40.81306982129298]
Parametric and retrieval-augmented models have complementary strengths in terms of computational efficiency and predictive accuracy.
We propose the Efficient Memory-Augmented Transformer (EMAT)
It encodes external knowledge into a key-value memory and exploits the fast maximum inner product search for memory querying.
arXiv Detail & Related papers (2022-10-30T08:34:49Z) - Learning to Learn Variational Semantic Memory [132.39737669936125]
We introduce variational semantic memory into meta-learning to acquire long-term knowledge for few-shot learning.
The semantic memory is grown from scratch and gradually consolidated by absorbing information from tasks it experiences.
We formulate memory recall as the variational inference of a latent memory variable from addressed contents.
arXiv Detail & Related papers (2020-10-20T15:05:26Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - Memory Transformer [0.31406146587437894]
Transformer-based models have achieved state-of-the-art results in many natural language processing tasks.
Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations.
We evaluate these memory augmented Transformers and demonstrate that presence of memory positively correlates with the model performance.
arXiv Detail & Related papers (2020-06-20T09:06:27Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.