Related papers: REALM: Retrieval-Augmented Language Model Pre-Training

REALM: Retrieval-Augmented Language Model Pre-Training

URL: http://arxiv.org/abs/2002.08909v1
Date: Mon, 10 Feb 2020 18:40:59 GMT
Title: REALM: Retrieval-Augmented Language Model Pre-Training
Authors: Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
Abstract summary: We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
Score: 37.3178586179607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

Related papers

Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency. This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z)
Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models [37.02290559379761]
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM.
arXiv Detail & Related papers (2024-04-04T15:21:22Z)
BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer [1.911678487931003]
Retrieval-based language models are increasingly employed in question-answering tasks. We develop the first Norwegian retrieval-based model by adapting the REALM framework. We show that this type of training improves the reader's performance on extractive question-answering.
arXiv Detail & Related papers (2023-04-19T13:40:47Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks. Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering [38.071375112873675]
We propose a question-answer augmented encoder-decoder model and accompanying pretraining strategy. This yields an end-to-end system that outperforms prior QA retrieval methods on single-hop QA tasks.
arXiv Detail & Related papers (2022-04-10T02:33:00Z)
Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge [91.15301779076187]
We introduce verbalized knowledge into the minibatches of a BERT model during pre-training and evaluate how well the model generalizes to supported inferences. We find generalization does not improve over the course of pre-training, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
arXiv Detail & Related papers (2021-12-16T03:13:04Z)
KgPLM: Knowledge-guided Language Model Pre-training via Generative and Discriminative Learning [45.067001062192844]
We present a language model pre-training framework guided by factual knowledge completion and verification. Experimental results on LAMA, a set of zero-shot cloze-style question answering tasks, show that our model contains richer factual knowledge than the conventional pre-trained language models.
arXiv Detail & Related papers (2020-12-07T09:39:25Z)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation. We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z)
How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way. We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.