SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
- URL: http://arxiv.org/abs/2507.05633v1
- Date: Tue, 08 Jul 2025 03:29:09 GMT
- Title: SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
- Authors: Yiqiao Jin, Kartik Sharma, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar,
- Abstract summary: We propose SARA, a unified RAG framework that balances local precision and global knowledge coverage under tight context budgets.<n> SARA combines natural-language text snippets with semantic compression vectors to jointly enhance context efficiency and answer correctness.
- Score: 28.043964124611026
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Retrieval-augmented Generation (RAG) extends large language models (LLMs) with external knowledge but faces key challenges: restricted effective context length and redundancy in retrieved documents. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a unified RAG framework that balances local precision and global knowledge coverage under tight context budgets. SARA combines natural-language text snippets with semantic compression vectors to jointly enhance context efficiency and answer correctness. It represents contexts at two complementary levels: 1) fine-grained natural-language spans that preserve critical entities and numerical values, and 2) compact, interpretable vectors that summarize high-level semantics. An iterative evidence-selection module employs the compression vectors for dynamic reranking of contexts. Across 9 datasets and 5 open-source LLMs spanning 3 model families (Mistral, Llama, and Gemma), SARA consistently improves answer relevance (+17.71), answer correctness (+13.72), and semantic similarity (+15.53), demonstrating the importance of integrating textual and compressed representations for robust, context-efficient RAG.
Related papers
- Domain-Specific Data Generation Framework for RAG Adaptation [58.20906914537952]
Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models with external retrieval to enable domain-grounded responses.<n>We propose RAGen, a framework for generating domain-grounded question-answer-context (QAC) triples tailored to diverse RAG adaptation approaches.
arXiv Detail & Related papers (2025-10-13T09:59:49Z) - ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval [125.19156877994612]
Generative retrieval (GR) reformulates information retrieval (IR) by framing it as the generation of document identifiers (docids)<n>We propose textscZeroGR, a zero-shot generative retrieval framework that leverages natural language instructions to extend GR across a wide range of IR tasks.<n>Specifically, textscZeroGR is composed of three key components: (i) an LM-based docid generator that unifies heterogeneous documents into semantically meaningful docids; (ii) an instruction-tuned query generator that generates diverse types of queries from natural language task descriptions to enhance
arXiv Detail & Related papers (2025-10-12T03:04:24Z) - CAAD: Context-Aware Adaptive Decoding for Truthful Text Generation [31.469511576774252]
We propose a context-aware adaptive decoding method for large language models.<n>Our approach achieves a 2.8 percent average improvement on TruthfulQA.<n>Our model-agnostic, scalable, and efficient method requires only a single generation pass.
arXiv Detail & Related papers (2025-08-04T08:28:25Z) - Enhancing RAG Efficiency with Adaptive Context Compression [20.745017748105955]
Retrieval-augmented generation incurs significant inference costs due to lengthy retrieved contexts.<n>Existing methods apply fixed compression rates, over-compressing simple queries or under-compressing complex ones.<n>We propose Adaptive Context Compression for RAG (ACC-RAG), a framework that dynamically adjusts compression rates based on input complexity.
arXiv Detail & Related papers (2025-07-24T13:46:51Z) - Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation [52.3707788779464]
We introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD)<n>ARC-JSD enables efficient and accurate identification of essential context sentences without additional fine-tuning, gradient-calculation or surrogate modelling.<n> Evaluations on a wide range of RAG benchmarks, such as TyDi QA, Hotpot QA, and Musique, using instruction-tuned LLMs in different scales demonstrate superior accuracy and significant computational efficiency improvements.
arXiv Detail & Related papers (2025-05-22T09:04:03Z) - SAGE: A Framework of Precise Retrieval for RAG [9.889395372896153]
Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering tasks.<n>RAG methods segment the corpus without considering semantics, making it difficult to find relevant context.<n>We introduce a RAG framework (SAGE) to overcome these limitations.
arXiv Detail & Related papers (2025-03-03T16:25:58Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Core Context Aware Transformers for Long Context Language Modeling [50.774702091154204]
We propose a plug-and-play Core Context Aware (CCA) Attention for efficient long-context modeling.<n>Our method automatically focuses and strengthens core context while diminishing redundancy during the learning process.<n>Our method is able to replace the self-attention module in existing Large Language Models with minimal fine-tuning cost.
arXiv Detail & Related papers (2024-12-17T01:54:08Z) - SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance.
We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination.
We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - CEV-LM: Controlled Edit Vector Language Model for Shaping Natural
Language Generations [5.148810760938979]
We introduce CEV-LM - a lightweight, semi-autoregressive language model that utilizes constrained edit vectors to control three complementary metrics.
We study an extensive set of state-of-the-art CTG models and find that CEV-LM provides significantly more targeted and precise control of these three metrics.
arXiv Detail & Related papers (2024-02-22T05:07:31Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.