REALM: Retrieval-Augmented Language Model Pre-Training
- URL: http://arxiv.org/abs/2002.08909v1
- Date: Mon, 10 Feb 2020 18:40:59 GMT
- Title: REALM: Retrieval-Augmented Language Model Pre-Training
- Authors: Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
- Abstract summary: We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
- Score: 37.3178586179607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language model pre-training has been shown to capture a surprising amount of
world knowledge, crucial for NLP tasks such as question answering. However,
this knowledge is stored implicitly in the parameters of a neural network,
requiring ever-larger networks to cover more facts.
To capture knowledge in a more modular and interpretable way, we augment
language model pre-training with a latent knowledge retriever, which allows the
model to retrieve and attend over documents from a large corpus such as
Wikipedia, used during pre-training, fine-tuning and inference. For the first
time, we show how to pre-train such a knowledge retriever in an unsupervised
manner, using masked language modeling as the learning signal and
backpropagating through a retrieval step that considers millions of documents.
We demonstrate the effectiveness of Retrieval-Augmented Language Model
pre-training (REALM) by fine-tuning on the challenging task of Open-domain
Question Answering (Open-QA). We compare against state-of-the-art models for
both explicit and implicit knowledge storage on three popular Open-QA
benchmarks, and find that we outperform all previous methods by a significant
margin (4-16% absolute accuracy), while also providing qualitative benefits
such as interpretability and modularity.
Related papers
- Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
Large language models (LLMs) have sparked debate over whether they genuinely generalize to unseen tasks or rely on memorizing vast amounts of pretraining data.
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the LLM output probabilities and the pretraining data frequency.
This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer [1.911678487931003]
Retrieval-based language models are increasingly employed in question-answering tasks.
We develop the first Norwegian retrieval-based model by adapting the REALM framework.
We show that this type of training improves the reader's performance on extractive question-answering.
arXiv Detail & Related papers (2023-04-19T13:40:47Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - Augmenting Pre-trained Language Models with QA-Memory for Open-Domain
Question Answering [38.071375112873675]
We propose a question-answer augmented encoder-decoder model and accompanying pretraining strategy.
This yields an end-to-end system that outperforms prior QA retrieval methods on single-hop QA tasks.
arXiv Detail & Related papers (2022-04-10T02:33:00Z) - Does Pre-training Induce Systematic Inference? How Masked Language
Models Acquire Commonsense Knowledge [91.15301779076187]
We introduce verbalized knowledge into the minibatches of a BERT model during pre-training and evaluate how well the model generalizes to supported inferences.
We find generalization does not improve over the course of pre-training, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
arXiv Detail & Related papers (2021-12-16T03:13:04Z) - KgPLM: Knowledge-guided Language Model Pre-training via Generative and
Discriminative Learning [45.067001062192844]
We present a language model pre-training framework guided by factual knowledge completion and verification.
Experimental results on LAMA, a set of zero-shot cloze-style question answering tasks, show that our model contains richer factual knowledge than the conventional pre-trained language models.
arXiv Detail & Related papers (2020-12-07T09:39:25Z) - Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation.
We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z) - How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way.
We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.