Related papers: On the Generalization Ability of Retrieval-Enhanced Transformers

On the Generalization Ability of Retrieval-Enhanced Transformers

URL: http://arxiv.org/abs/2302.12128v1
Date: Thu, 23 Feb 2023 16:11:04 GMT
Title: On the Generalization Ability of Retrieval-Enhanced Transformers
Authors: Tobias Norlund, Ehsan Doostmohammadi, Richard Johansson, Marco Kuhlmann
Abstract summary: Off-loading memory from trainable weights to a retrieval database can significantly improve language modeling. It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval. We find that the performance gains from retrieval largely originate from overlapping tokens between the database and the test data.
Score: 1.0552465253379135
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work on the Retrieval-Enhanced Transformer (RETRO) model has shown that off-loading memory from trainable weights to a retrieval database can significantly improve language modeling and match the performance of non-retrieval models that are an order of magnitude larger in size. It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval. In this paper, we try to better understand the relative contributions of these two components. We find that the performance gains from retrieval largely originate from overlapping tokens between the database and the test data, suggesting less non-trivial generalization than previously assumed. More generally, our results point to the challenges of evaluating the generalization of retrieval-augmented language models such as RETRO, as even limited token overlap may significantly decrease test-time loss. We release our code and model at https://github.com/TobiasNorlund/retro

Related papers

Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding [2.0257616108612373]
This paper introduces a model-agnostic doc-level embedding framework through large language model augmentation. We have been able to significantly improve the effectiveness of widely-used retriever models.
arXiv Detail & Related papers (2024-04-08T19:29:07Z)
Repeat After Me: Transformers are Better than State Space Models at Copying [53.47717661441142]
We show that while generalized state space models are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.
arXiv Detail & Related papers (2024-02-01T21:44:11Z)
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction [22.659005954676598]
We show that it is possible to significantly improve the performance of Large Language Models by selectively removing higher-order components of their weight matrices. This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed. We show extensive experiments demonstrating the generality of this finding across language models and datasets.
arXiv Detail & Related papers (2023-12-21T03:51:08Z)
Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models [1.0552465253379135]
We study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities. Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity.
arXiv Detail & Related papers (2023-05-25T16:56:26Z)
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner. A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge. Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z)
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study [115.96080028033904]
We study a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT. Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models.
arXiv Detail & Related papers (2023-04-13T18:04:19Z)
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models [6.425088990363101]
We examine the relationship between fluency and attribution in Large Language Models prompted with retrieved evidence. We show that larger models tend to do much better in both fluency and attribution. We propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval.
arXiv Detail & Related papers (2023-02-11T02:43:34Z)
DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models. We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn. Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Improving language models by retrieving from trillions of tokens [50.42630445476544]
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile.
arXiv Detail & Related papers (2021-12-08T17:32:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.