Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
- URL: http://arxiv.org/abs/2410.00004v1
- Date: Thu, 12 Sep 2024 23:29:33 GMT
- Title: Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
- Authors: Gentiana Rashiti, Geethan Karunaratne, Mrinmaya Sachan, Abu Sebastian, Abbas Rahimi,
- Abstract summary: Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries.
We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory.
We show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (1%) performance loss.
- Score: 36.251000184801576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.
Related papers
- Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented
Language Models [1.0552465253379135]
We study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities.
Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity.
arXiv Detail & Related papers (2023-05-25T16:56:26Z) - ReFIT: Relevance Feedback from a Reranker during Inference [109.33278799999582]
Retrieve-and-rerank is a prevalent framework in neural information retrieval.
We propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time.
arXiv Detail & Related papers (2023-05-19T15:30:33Z) - Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory [72.36736686941671]
We propose a novel framework, selfmem, for improving retrieval-augmented generation models.
Selfmem iteratively employs a retrieval-augmented generator to create an unbounded memory pool and using a memory selector to choose one output as memory for the subsequent generation round.
We evaluate the effectiveness of selfmem on three distinct text generation tasks.
arXiv Detail & Related papers (2023-05-03T21:40:54Z) - Re2G: Retrieve, Rerank, Generate [14.848179433828252]
We propose Re2G, which combines neural initial retrieval and reranking into a BART-based sequence-to-sequence generation.
To train our system end-to-end, we introduce a novel variation of knowledge distillation to train the initial retrieval, reranker, and generation using only ground truth on the target sequence output.
We find incomparable gains in four diverse tasks: zero-shot slot filling, question answering, fact-checking, and dialog, with relative gains of 9% to 34% over the previous state-of-the-art on the KILT leaderboard.
arXiv Detail & Related papers (2022-07-13T15:51:40Z) - Transformer with Memory Replay [13.478839407623978]
Transformers achieve state-of-the-art performance for natural language processing tasks by pre-training on large-scale text corpora.
Memory replay is a mechanism that remembers and reuses past examples by saving to and replaying from a memory buffer.
We propose emphTransformer with Memory Replay (TMR), which integrates memory replay with transformer, making transformer more sample-efficient.
arXiv Detail & Related papers (2022-05-19T21:27:36Z) - Virtual Replay Cache [20.531576904743282]
We propose a new data structure, the Virtual Replay Cache (VRC), to address these shortcomings.
VRC nearly eliminates DQN(lambda)'s cache memory footprint and slightly reduces the total training time on our hardware.
arXiv Detail & Related papers (2021-12-06T23:40:27Z) - Recall@k Surrogate Loss with Large Batches and Similarity Mixup [62.67458021725227]
Direct optimization, by gradient descent, of an evaluation metric is not possible when it is non-differentiable.
In this work, a differentiable surrogate loss for the recall is proposed.
The proposed method achieves state-of-the-art results in several image retrieval benchmarks.
arXiv Detail & Related papers (2021-08-25T11:09:11Z) - CNN with large memory layers [2.368995563245609]
This work is centred around the recently proposed product key memory structure citelarge_memory, implemented for a number of computer vision applications.
The memory structure can be regarded as a simple computation primitive suitable to be augmented to nearly all neural network architectures.
arXiv Detail & Related papers (2021-01-27T20:58:20Z) - ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and
Gradient Accumulation [106.04777600352743]
Differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory.
The single-path DARTS comes in, which only chooses a single-path submodel at each step.
While being memory-friendly, it also comes with low computational costs.
We propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure.
arXiv Detail & Related papers (2020-11-23T06:34:07Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.