CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering
- URL: http://arxiv.org/abs/2602.05728v1
- Date: Thu, 05 Feb 2026 14:52:06 GMT
- Title: CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering
- Authors: Hao Yang, Zhiyu Yang, Xupeng Zhang, Wei Wei, Yunjie Zhang, Lin Yang,
- Abstract summary: Existing multi-hop RAG systems alternate between retrieval and reasoning at each step.<n>We propose CompactRAG, a framework that decouples offline corpus restructuring from online reasoning.<n>Experiments on HotpotQA, 2WikiMultiHopQA, and MuSiQue demonstrate that CompactRAG achieves competitive accuracy while substantially reducing token consumption.
- Score: 15.281365738928415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented generation (RAG) has become a key paradigm for knowledge-intensive question answering. However, existing multi-hop RAG systems remain inefficient, as they alternate between retrieval and reasoning at each step, resulting in repeated LLM calls, high token consumption, and unstable entity grounding across hops. We propose CompactRAG, a simple yet effective framework that decouples offline corpus restructuring from online reasoning. In the offline stage, an LLM reads the corpus once and converts it into an atomic QA knowledge base, which represents knowledge as minimal, fine-grained question-answer pairs. In the online stage, complex queries are decomposed and carefully rewritten to preserve entity consistency, and are resolved through dense retrieval followed by RoBERTa-based answer extraction. Notably, during inference, the LLM is invoked only twice in total - once for sub-question decomposition and once for final answer synthesis - regardless of the number of reasoning hops. Experiments on HotpotQA, 2WikiMultiHopQA, and MuSiQue demonstrate that CompactRAG achieves competitive accuracy while substantially reducing token consumption compared to iterative RAG baselines, highlighting a cost-efficient and practical approach to multi-hop reasoning over large knowledge corpora. The implementation is available at GitHub.
Related papers
- Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering [14.456873356080186]
Reasoning Tree Guided RAG (RT-RAG) is a novel hierarchical framework for complex multi-hop QA.<n>RT-RAG systematically decomposes multi-hop questions into explicit reasoning trees, minimizing inaccurate decomposition.
arXiv Detail & Related papers (2026-01-16T13:02:25Z) - Think Straight, Stop Smart: Structured Reasoning for Efficient Multi-Hop RAG [24.494759581234803]
TSSS (Think Straight, Stop Smart) is a structured multi-hop RAG framework designed for efficiency.<n> TSSS introduces (i) a template-based reasoning that caches recurring prefixes and anchors sub-queries to the main question.<n>On HotpotQA, 2WikiMultiHop, and MuSiQue, TSSS achieves state-of-the-art accuracy and competitive efficiency among RAG-CoT approaches.
arXiv Detail & Related papers (2025-10-22T02:09:23Z) - LLM-guided Hierarchical Retrieval [54.73080745446999]
LATTICE is a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity.<n>A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy.<n>Our framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark.
arXiv Detail & Related papers (2025-10-15T07:05:17Z) - Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries [53.99620546358492]
Real-world use cases often present RAG systems with complex queries for which relevant information is missing from the corpus or is incomplete.<n>Existing RAG benchmarks rarely reflect realistic task complexity for multi-hop or out-of-scope questions.<n>We present the first pipeline for automatic, difficulty-controlled creation of un$underlinec$heatable, $underliner$ealistic, $underlineu$nanswerable, and $underlinem$ulti-hop.
arXiv Detail & Related papers (2025-10-13T21:38:04Z) - Resource-Friendly Dynamic Enhancement Chain for Multi-Hop Question Answering [21.077964610022313]
This work proposes a novel framework called DEC (Dynamic Enhancement Chain)<n> DEC first decomposes complex questions into logically coherent subquestions to form a hallucination-free reasoning chain.<n>It then iteratively refines these subquestions through context-aware rewriting to generate effective query formulations.
arXiv Detail & Related papers (2025-06-21T11:55:27Z) - GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval [52.47514434103737]
We introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves state-of-the-art performance.<n> GRITHopper combines generative and representational instruction tuning by integrating causal language modeling with dense retrieval training.<n>We find that incorporating additional context after the retrieval process, referred to as post-retrieval language modeling, enhances dense retrieval performance.
arXiv Detail & Related papers (2025-03-10T16:42:48Z) - BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression [91.23933111083389]
Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge.<n>This paper presents BRIEF, a lightweight approach that performs query-aware multi-hop reasoning.<n>Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries.
arXiv Detail & Related papers (2024-10-20T04:24:16Z) - GenSco: Can Question Decomposition based Passage Alignment improve Question Answering? [1.5776201492893507]
"GenSco" is a novel approach of selecting passages based on the predicted decomposition of the multi-hop questions.
We evaluate on three broadly established multi-hop question answering datasets.
arXiv Detail & Related papers (2024-07-14T15:25:08Z) - FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering [46.41364317172677]
Large Language Models (LLMs) are often challenged by generating erroneous or hallucinated responses.<n>We propose a unified framework, FiDeLiS, designed to improve the factuality of LLM responses by anchoring answers to verifiable reasoning steps retrieved from Knowledge Graphs.<n>Our method, as a training-free framework, not only improve the performance but also enhance the factuality and interpretability across different benchmarks.
arXiv Detail & Related papers (2024-05-22T17:56:53Z) - Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning [73.51314109184197]
It is crucial for large language models (LLMs) to understand the concept of temporal knowledge.
We propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning.
arXiv Detail & Related papers (2023-11-16T11:49:29Z) - Logical Message Passing Networks with One-hop Inference on Atomic
Formulas [57.47174363091452]
We propose a framework for complex query answering that decomposes the Knowledge Graph embeddings from neural set operators.
On top of the query graph, we propose the Logical Message Passing Neural Network (LMPNN) that connects the local one-hop inferences on atomic formulas to the global logical reasoning.
Our approach yields the new state-of-the-art neural CQA model.
arXiv Detail & Related papers (2023-01-21T02:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.