xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
- URL: http://arxiv.org/abs/2405.13792v1
- Date: Wed, 22 May 2024 16:15:17 GMT
- Title: xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
- Authors: Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao,
- Abstract summary: xRAG is an innovative context compression method tailored for retrieval-augmented generation.
xRAG seamlessly integrates document embeddings into the language model representation space.
Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks.
- Score: 108.7069350303884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively eliminating the need for their textual counterparts and achieving an extreme compression rate. In xRAG, the only trainable component is the modality bridge, while both the retriever and the language model remain frozen. This design choice allows for the reuse of offline-constructed document embeddings and preserves the plug-and-play nature of retrieval augmentation. Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks, adaptable to various language model backbones, ranging from a dense 7B model to an 8x7B Mixture of Experts configuration. xRAG not only significantly outperforms previous context compression methods but also matches the performance of uncompressed models on several datasets, while reducing overall FLOPs by a factor of 3.53. Our work pioneers new directions in retrieval-augmented generation from the perspective of multimodality fusion, and we hope it lays the foundation for future efficient and scalable retrieval-augmented systems
Related papers
- FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score.
We propose a principled metric to replace each pruned block using a weight-sharing mechanism.
Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z) - RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation [21.764973680014368]
RetroLLM is a unified framework that integrates retrieval and generation into a single, cohesive process.
To mitigate false pruning in the process of constrained evidence generation, we introduce hierarchical FM-Index constraints.
Experiments on five open-domain QA datasets demonstrate RetroLLM's superior performance across both in-domain and out-of-domain tasks.
arXiv Detail & Related papers (2024-12-16T16:03:25Z) - CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters.
Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework.
Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z) - RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation [42.82192656794179]
Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses.
This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in unseen scenarios.
Retrieval-Augmented Generation (RAG) addresses this by incorporating external, relevant documents into the response generation process.
arXiv Detail & Related papers (2024-03-31T08:58:54Z) - Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - MuRAG: Multimodal Retrieval-Augmented Generator for Open Question
Answering over Images and Text [58.655375327681774]
We propose the first Multimodal Retrieval-Augmented Transformer (MuRAG)
MuRAG accesses an external non-parametric multimodal memory to augment language generation.
Our results show that MuRAG achieves state-of-the-art accuracy, outperforming existing models by 10-20% absolute on both datasets.
arXiv Detail & Related papers (2022-10-06T13:58:03Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.