On the Generalization Ability of Retrieval-Enhanced Transformers
- URL: http://arxiv.org/abs/2302.12128v1
- Date: Thu, 23 Feb 2023 16:11:04 GMT
- Title: On the Generalization Ability of Retrieval-Enhanced Transformers
- Authors: Tobias Norlund, Ehsan Doostmohammadi, Richard Johansson, Marco
Kuhlmann
- Abstract summary: Off-loading memory from trainable weights to a retrieval database can significantly improve language modeling.
It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval.
We find that the performance gains from retrieval largely originate from overlapping tokens between the database and the test data.
- Score: 1.0552465253379135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work on the Retrieval-Enhanced Transformer (RETRO) model has shown
that off-loading memory from trainable weights to a retrieval database can
significantly improve language modeling and match the performance of
non-retrieval models that are an order of magnitude larger in size. It has been
suggested that at least some of this performance gain is due to non-trivial
generalization based on both model weights and retrieval. In this paper, we try
to better understand the relative contributions of these two components. We
find that the performance gains from retrieval largely originate from
overlapping tokens between the database and the test data, suggesting less
non-trivial generalization than previously assumed. More generally, our results
point to the challenges of evaluating the generalization of retrieval-augmented
language models such as RETRO, as even limited token overlap may significantly
decrease test-time loss. We release our code and model at
https://github.com/TobiasNorlund/retro
Related papers
- LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding [2.0257616108612373]
This paper introduces a model-agnostic doc-level embedding framework through large language model augmentation.
We have been able to significantly improve the effectiveness of widely-used retriever models.
arXiv Detail & Related papers (2024-04-08T19:29:07Z) - Repeat After Me: Transformers are Better than State Space Models at Copying [53.47717661441142]
We show that while generalized state space models are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.
arXiv Detail & Related papers (2024-02-01T21:44:11Z) - The Truth is in There: Improving Reasoning in Language Models with
Layer-Selective Rank Reduction [22.659005954676598]
We show that it is possible to significantly improve the performance of Large Language Models by selectively removing higher-order components of their weight matrices.
This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed.
We show extensive experiments demonstrating the generality of this finding across language models and datasets.
arXiv Detail & Related papers (2023-12-21T03:51:08Z) - Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented
Language Models [1.0552465253379135]
We study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities.
Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity.
arXiv Detail & Related papers (2023-05-25T16:56:26Z) - Enhancing Retrieval-Augmented Large Language Models with Iterative
Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.
A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge.
Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z) - Shall We Pretrain Autoregressive Language Models with Retrieval? A
Comprehensive Study [115.96080028033904]
We study a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT.
Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models.
arXiv Detail & Related papers (2023-04-13T18:04:19Z) - Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented
Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence.
We show that larger models tend to do much better in both fluency and attribution.
We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Improving language models by retrieving from trillions of tokens [50.42630445476544]
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus.
With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile.
arXiv Detail & Related papers (2021-12-08T17:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.