ECoRAG: Evidentiality-guided Compression for Long Context RAG
- URL: http://arxiv.org/abs/2506.05167v2
- Date: Fri, 06 Jun 2025 07:57:28 GMT
- Title: ECoRAG: Evidentiality-guided Compression for Long Context RAG
- Authors: Yeonseok Jeong, Jinsu Kim, Dohyeon Lee, Seung-won Hwang,
- Abstract summary: We propose Evidentiality-guided RAG, or ECoRAG framework.<n>ECoRAG improves performance by compressing retrieved documents based on evidentiality.<n>ECoRAG is highly cost-efficient, as it not only reduces latency but also minimizes token usage.
- Score: 22.842546956145064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown remarkable performance in Open-Domain Question Answering (ODQA) by leveraging external documents through Retrieval-Augmented Generation (RAG). To reduce RAG overhead, from longer context, context compression is necessary. However, prior compression methods do not focus on filtering out non-evidential information, which limit the performance in LLM-based RAG. We thus propose Evidentiality-guided RAG, or ECoRAG framework. ECoRAG improves LLM performance by compressing retrieved documents based on evidentiality, ensuring whether answer generation is supported by the correct evidence. As an additional step, ECoRAG reflects whether the compressed content provides sufficient evidence, and if not, retrieves more until sufficient. Experiments show that ECoRAG improves LLM performance on ODQA tasks, outperforming existing compression methods. Furthermore, ECoRAG is highly cost-efficient, as it not only reduces latency but also minimizes token usage by retaining only the necessary information to generate the correct answer. Code is available at https://github.com/ldilab/ECoRAG.
Related papers
- Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation [49.48204107529758]
We define token overflow as a regime in which compressed representations no longer contain sufficient information to answer a given query.<n>In this paper, we find that query-agnostic saturation statistics reliably separate compressed from uncompressed token representations.<n>Lightweight probing classifiers over both query and context xRAG representations detect overflow with 0.72 AUC-ROC on average.<n>These results advance from query-independent diagnostics to query-aware detectors, enabling low-cost pre-LLM gating to mitigate compression-induced errors.
arXiv Detail & Related papers (2026-02-12T18:15:08Z) - LURE-RAG: Lightweight Utility-driven Reranking for Efficient RAG [15.963908827464296]
We propose Lightweight Utility-driven Reranking for Efficient RAG.<n>It augments any black-box retriever with an efficient Lambda-based reranker.<n>It achieves competitive performance, reaching 97-98% of the state-of-the-art dense neural baseline.
arXiv Detail & Related papers (2026-01-27T12:26:31Z) - REFRAG: Rethinking RAG based Decoding [67.4862300145604]
REFRAG is an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications.<n>We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization.
arXiv Detail & Related papers (2025-09-01T03:31:44Z) - CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning [22.93037884068796]
Retrieval-Augmented Generation (RAG) has emerged as a promising approach to enhance the timeliness of knowledge updates and the factual accuracy of responses in large language models.<n>Existing approaches to document compression tailored for RAG often degrade task performance.<n>We propose CORE, a novel method for lossless context compression in RAG.
arXiv Detail & Related papers (2025-08-24T12:21:50Z) - Enhancing RAG Efficiency with Adaptive Context Compression [20.745017748105955]
Retrieval-augmented generation incurs significant inference costs due to lengthy retrieved contexts.<n>Existing methods apply fixed compression rates, over-compressing simple queries or under-compressing complex ones.<n>We propose Adaptive Context Compression for RAG (ACC-RAG), a framework that dynamically adjusts compression rates based on input complexity.
arXiv Detail & Related papers (2025-07-24T13:46:51Z) - Long Context In-Context Compression by Getting to the Gist of Gisting [50.24627831994713]
GistPool is an in-context compression method with no architectural modification to the decoder transformer.<n>We demonstrate that gisting struggles with longer contexts, with significant performance drops even at minimal compression rates.<n>GistPool preserves the simplicity of gisting, while significantly boosting its performance on long context compression tasks.
arXiv Detail & Related papers (2025-04-11T19:23:31Z) - AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation [3.735352744532723]
We propose AttentionRAG, an attention-guided context pruning method for RAG systems.<n>The core idea of AttentionRAG lies in its attention focus mechanism, which reformulates RAG queries into a next-token prediction paradigm.<n>Experiments on LongBench and Babilong benchmarks show that AttentionRAG achieves up to 6.3$times$ context compression while outperforming LLMLingua methods by around 10% in key metrics.
arXiv Detail & Related papers (2025-03-13T08:22:28Z) - Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks [11.053340674721005]
Retrieval-augmented generation (RAG) has gained traction as a powerful approach for enhancing language models by integrating external knowledge sources.<n>This paper proposes an alternative paradigm, cache-augmented generation (CAG) that bypasses real-time retrieval.
arXiv Detail & Related papers (2024-12-20T06:58:32Z) - EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation [8.757777529568383]
Current RAG systems often struggle when retrieval models fail to rank the most relevant documents.<n>We introduce EXIT, an extractive context compression framework.<n>Our evaluations show that EXIT consistently surpasses existing compression methods.
arXiv Detail & Related papers (2024-12-17T05:38:27Z) - EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation [84.70637613266835]
EoRA is a fine-tuning-free method that augments compressed Large Language Models with low-rank matrices.<n>EoRA consistently outperforms prior training-free low rank methods in recovering the accuracy of compressed LLMs.
arXiv Detail & Related papers (2024-10-28T17:59:03Z) - ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems [2.8692611791027893]
Retrieval-Augmented Generation (RAG) systems generate inaccurate responses due to the retrieval of irrelevant or loosely related information.<n>We propose ChunkRAG, a framework that enhances RAG systems by evaluating and filtering retrieved information at the chunk level.
arXiv Detail & Related papers (2024-10-25T14:07:53Z) - BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression [91.23933111083389]
Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge.<n>This paper presents BRIEF, a lightweight approach that performs query-aware multi-hop reasoning.<n>Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries.
arXiv Detail & Related papers (2024-10-20T04:24:16Z) - SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance.
We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination.
We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z) - MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation [60.04380907045708]
Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem.<n>We propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval.<n>MemoRAG achieves superior performances across a variety of long-context evaluation tasks.
arXiv Detail & Related papers (2024-09-09T13:20:31Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.