Think Straight, Stop Smart: Structured Reasoning for Efficient Multi-Hop RAG
- URL: http://arxiv.org/abs/2510.19171v1
- Date: Wed, 22 Oct 2025 02:09:23 GMT
- Title: Think Straight, Stop Smart: Structured Reasoning for Efficient Multi-Hop RAG
- Authors: Jihwan Bang, Juntae Lee, Seunghan Yang, Sungha Choi,
- Abstract summary: TSSS (Think Straight, Stop Smart) is a structured multi-hop RAG framework designed for efficiency.<n> TSSS introduces (i) a template-based reasoning that caches recurring prefixes and anchors sub-queries to the main question.<n>On HotpotQA, 2WikiMultiHop, and MuSiQue, TSSS achieves state-of-the-art accuracy and competitive efficiency among RAG-CoT approaches.
- Score: 24.494759581234803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-hop retrieval-augmented generation (RAG) is a promising strategy for complex reasoning, yet existing iterative prompting approaches remain inefficient. They often regenerate predictable token sequences at every step and rely on stochastic stopping, leading to excessive token usage and unstable termination. We propose TSSS (Think Straight, Stop Smart), a structured multi-hop RAG framework designed for efficiency. TSSS introduces (i) a template-based reasoning that caches recurring prefixes and anchors sub-queries to the main question, reducing token generation cost while promoting stable reasoning, and (ii) a retriever-based terminator, which deterministically halts reasoning once additional sub-queries collapse into repetition. This separation of structured reasoning and termination control enables both faster inference and more reliable answers. On HotpotQA, 2WikiMultiHop, and MuSiQue, TSSS achieves state-of-the-art accuracy and competitive efficiency among RAG-CoT approaches, highlighting its effectiveness in efficiency-constrained scenarios such as on-device inference.
Related papers
- Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning [39.72119774004103]
Chain-of-Thought (CoT) has substantially empowered Large Language Models (LLMs) to tackle complex reasoning tasks.<n>The verbose nature of explicit reasoning steps incurs prohibitive inference latency and computational costs, limiting real-world deployment.<n>We propose Compress responses for Easy questions and Explore Hard ones (CEEH), a difficulty-aware approach to RL-based efficient reasoning.
arXiv Detail & Related papers (2026-02-26T05:47:30Z) - Constraint-Rectified Training for Efficient Chain-of-Thought [60.52883907721588]
Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs)<n>While longer reasoning traces can improve answer quality and unlock abilities such as self-correction, they also incur high inference costs and often introduce redundant steps, known as overthinking.<n>Recent research seeks to develop efficient reasoning strategies that balance reasoning length and accuracy.
arXiv Detail & Related papers (2026-02-13T02:13:45Z) - CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering [15.281365738928415]
Existing multi-hop RAG systems alternate between retrieval and reasoning at each step.<n>We propose CompactRAG, a framework that decouples offline corpus restructuring from online reasoning.<n>Experiments on HotpotQA, 2WikiMultiHopQA, and MuSiQue demonstrate that CompactRAG achieves competitive accuracy while substantially reducing token consumption.
arXiv Detail & Related papers (2026-02-05T14:52:06Z) - CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction [50.67483317563736]
This paper aims to explore a system that can think step-by-step, look up information if needed, generate results, self-evaluate its own results, and refine the results.<n>We introduce CoT-Seg, a training-free framework that rethinks reasoning segmentation by combining chain-of-thought reasoning with self-correction.
arXiv Detail & Related papers (2026-01-24T11:41:54Z) - Reinforced Efficient Reasoning via Semantically Diverse Exploration [73.41112984160992]
Reinforcement learning with verifiable rewards (RLVR) has proven effective in enhancing the reasoning of large language models (LLMs)<n>We propose reinforced efficient reasoning via semantically diverse explorations, i.e., ROSE, for LLMs.<n>Our method incorporates a semantic-entropy-based branching strategy and an $varepsilon$-exploration mechanism.
arXiv Detail & Related papers (2026-01-08T15:56:44Z) - TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z) - Stop-RAG: Value-Based Retrieval Control for Iterative RAG [10.378290102256534]
Iterative retrieval-augmented generation (RAG) enables large language models to answer complex multi-hop questions.<n>Existing methods either use a predetermined number of iterations or rely on confidence proxies that poorly reflect whether more retrieval will actually help.<n>We introduce Stop-RAG, a value-based controller that adaptively decides when to stop retrieving.
arXiv Detail & Related papers (2025-10-16T06:17:38Z) - Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts [6.845529733164892]
We propose Retrieval-of-Thought (RoT), which reuses prior reasoning as composable thought" steps to guide new problems.<n>RoT organizes steps into a thought graph with sequential and semantic edges to enable fast retrieval and flexible recombination.<n>We evaluate RoT on reasoning benchmarks with multiple models, measuring accuracy, token usage, latency, and memory overhead.
arXiv Detail & Related papers (2025-09-26T01:17:35Z) - SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control [5.224609066309358]
Large reasoning models (LRMs) have exhibited remarkable reasoning capabilities through inference-time scaling.<n>Previous work has attempted to mitigate this issue by penalizing the overall length of generated samples during reinforcement learning.<n>We propose SmartThinker, a two-stage learnable framework designed to enable fine-grained control over the length of reasoning chains.
arXiv Detail & Related papers (2025-07-06T11:21:47Z) - ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [74.37307916314407]
We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning.
arXiv Detail & Related papers (2025-06-23T16:20:44Z) - Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - Credible Plan-Driven RAG Method for Multi-Hop Question Answering [2.5772544412212985]
We propose PAR-RAG (Plan-then-Act-and-Review RAG), a novel framework inspired by the PDCA (Plan-Do-Check-Act) cycle.<n>Par-RAG selects exemplars matched by the semantic complexity of the current question to guide complexity-aware top-down planning.<n>A dual-verification mechanism evaluates and corrects intermediate errors, ensuring that the reasoning process remains factually grounded.
arXiv Detail & Related papers (2025-04-23T15:03:17Z) - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [64.74765550805024]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 84% with minimal accuracy loss across 18 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z) - Efficient Reasoning with Hidden Thinking [48.96945580741641]
Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities.<n>We propose $textbfHeima$ (as hidden llama), an efficient reasoning framework that leverages reasoning CoTs at hidden latent space.<n>Heima model achieves higher generation efficiency while maintaining or even better zero-shot task accuracy.
arXiv Detail & Related papers (2025-01-31T15:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.