SPARC-RAG: Adaptive Sequential-Parallel Scaling with Context Management for Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2602.00083v1
- Date: Thu, 22 Jan 2026 20:18:55 GMT
- Title: SPARC-RAG: Adaptive Sequential-Parallel Scaling with Context Management for Retrieval-Augmented Generation
- Authors: Yuxin Yang, Gangda Deng, Ömer Faruk Akgül, Nima Chitsazan, Yash Govilkar, Akasha Tigalappanavara, Shi-Xiong Zhang, Sambit Sahu, Viktor Prasanna,
- Abstract summary: Retrieval-Augmented Generation grounds large language model outputs in external evidence.<n>Recent works scale RAG at inference time along two complementary dimensions.<n>We propose a multi-agent framework that coordinates sequential and parallel inference-time scaling.
- Score: 8.00733338569737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) grounds large language model outputs in external evidence, but remains challenged on multi-hop question answering that requires long reasoning. Recent works scale RAG at inference time along two complementary dimensions: sequential depth for iterative refinement and parallel width for coverage expansion. However, naive scaling causes context contamination and scaling inefficiency, leading to diminishing or negative returns despite increased computation. To address these limitations, we propose SPARC-RAG, a multi-agent framework that coordinates sequential and parallel inference-time scaling under a unified context management mechanism. SPARC-RAG employs specialized agents that maintain a shared global context and provide explicit control over the scaling process. It generates targeted, complementary sub-queries for each branch to enable diverse parallel exploration, and explicitly regulates exiting decisions based on answer correctness and evidence grounding. To optimize scaling behavior, we further introduce a lightweight fine-tuning method with process-level verifiable preferences, which improves the efficiency of sequential scaling and effectiveness of parallel scaling. Across single- and multi-hop QA benchmarks, SPARC-RAG consistently outperforms previous RAG baselines, yielding an average +6.2 F1 improvement under lower inference cost.
Related papers
- Learning to Share: Selective Memory for Efficient Parallel Agentic Systems [49.78267008828593]
Agentic systems solve complex tasks by coordinating multiple agents that iteratively reason, invoke tools, and exchange intermediate results.<n>Recent approaches deploy multiple agent teams running in parallel to explore diverse reasoning trajectories.<n>We propose Learning to Share (LTS), a learned shared-memory mechanism for parallel agentic frameworks.
arXiv Detail & Related papers (2026-02-05T18:20:21Z) - What If We Allocate Test-Time Compute Adaptively? [2.1713977971908944]
Test-time scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking.<n>We propose a verifier-guided adaptive framework treating reasoning as iterative trajectory generation and selection.<n>Across datasets, our dynamic, PRM-guided approach consistently outperforms direct test-time scaling.
arXiv Detail & Related papers (2026-02-01T07:30:22Z) - TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG [71.06073770344732]
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval.<n>We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining outcome-only rewards.
arXiv Detail & Related papers (2026-01-11T14:07:30Z) - REFRAG: Rethinking RAG based Decoding [67.4862300145604]
REFRAG is an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications.<n>We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization.
arXiv Detail & Related papers (2025-09-01T03:31:44Z) - HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving [10.130938079844121]
HedraRAG is a runtime system built on a graph-based abstraction that exposes optimization opportunities across stage-level parallelism, intra-request similarity, and inter-request skewness.<n>The resulting execution plans are mapped onto hybrid CPU-GPU pipelines to improve resource utilization and reduce latency.
arXiv Detail & Related papers (2025-07-12T04:42:43Z) - Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation [52.3707788779464]
We introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD)<n>ARC-JSD enables efficient and accurate identification of essential context sentences without additional fine-tuning, gradient-calculation or surrogate modelling.<n> Evaluations on a wide range of RAG benchmarks, such as TyDi QA, Hotpot QA, and Musique, using instruction-tuned LLMs in different scales demonstrate superior accuracy and significant computational efficiency improvements.
arXiv Detail & Related papers (2025-05-22T09:04:03Z) - Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps [16.84310001807895]
This paper introduces a model-agnostic approach that can be applied to A-RAG methods.<n>Specifically, we use cache access and parallel generation to speed up the prefilling and decoding stages respectively.
arXiv Detail & Related papers (2025-05-19T05:39:38Z) - Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Inference Scaling for Long-Context Retrieval Augmented Generation [37.15479223789199]
scaling inference computation has unlocked the potential of long-context large language models (LLMs)<n>In this work, we investigate the combination of multiple strategies beyond simply increasing the quantity of knowledge, including in-context learning and iterative prompting.<n>We show that scaling inference compute on long-context LLMs achieves up to 58.9% gains on benchmark datasets compared to standard RAG.
arXiv Detail & Related papers (2024-10-06T03:42:15Z) - Sample-Efficient "Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection [3.913403111891027]
We modify the commonly used "divide and conquer" framework in parallel computing by adding a correlation-based clustering step.<n>This seemingly simple modification yields an $mathcalO(p)$ reduction in sample complexity for a widely used class of sample-optimal R&S procedures.<n>In large-scale AI applications such as neural architecture search, our methods demonstrate superior performance.
arXiv Detail & Related papers (2024-02-03T15:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.