SIGHT: Reinforcement Learning with Self-Evidence and Information-Gain Diverse Branching for Search Agent
- URL: http://arxiv.org/abs/2602.11551v1
- Date: Thu, 12 Feb 2026 04:16:55 GMT
- Title: SIGHT: Reinforcement Learning with Self-Evidence and Information-Gain Diverse Branching for Search Agent
- Authors: Wenlin Zhong, Jinluan Yang, Yiquan Wu, Yi Liu, Jianhang Yao, Kun Kuang,
- Abstract summary: SIGHT is a framework that enhances search-based reasoning through Self-Evidence Support and Information-Gain Driven Diverse Branching.<n>SIGHT distills search results into high-fidelity evidence via SES and calculates an Information Gain score to pinpoint pivotal states.<n> Experiments on single-hop and multi-hop QA benchmarks demonstrate that SIGHT significantly outperforms existing approaches.
- Score: 39.43590030917357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) has empowered Large Language Models (LLMs) to master autonomous search for complex question answering. However, particularly within multi-turn search scenarios, this interaction introduces a critical challenge: search results often suffer from high redundancy and low signal-to-noise ratios. Consequently, agents easily fall into "Tunnel Vision," where the forced interpretation of early noisy retrievals leads to irreversible error accumulation. To address these challenges, we propose SIGHT, a framework that enhances search-based reasoning through Self-Evidence Support (SES) and Information-Gain Driven Diverse Branching. SIGHT distills search results into high-fidelity evidence via SES and calculates an Information Gain score to pinpoint pivotal states where observations maximally reduce uncertainty. This score guides Dynamic Prompting Interventions - including de-duplication, reflection, or adaptive branching - to spawn new branches with SES. Finally, by integrating SES and correctness rewards via Group Relative Policy Optimization, SIGHT internalizes robust exploration strategies without external verifiers. Experiments on single-hop and multi-hop QA benchmarks demonstrate that SIGHT significantly outperforms existing approaches, particularly in complex reasoning scenarios, using fewer search steps.
Related papers
- Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search [56.78490647843876]
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use.<n>We propose bfM-ASK, a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context.
arXiv Detail & Related papers (2026-01-08T08:13:27Z) - Multi-hop Reasoning via Early Knowledge Alignment [68.28168992785896]
Early Knowledge Alignment (EKA) aims to align Large Language Models with contextually relevant retrieved knowledge.<n>EKA significantly improves retrieval precision, reduces cascading errors, and enhances both performance and efficiency.<n>EKA proves effective as a versatile, training-free inference strategy that scales seamlessly to large models.
arXiv Detail & Related papers (2025-12-23T08:14:44Z) - Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning [137.33138614095435]
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models.<n>Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval.<n>We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions.
arXiv Detail & Related papers (2025-11-12T08:29:39Z) - VAR: Visual Attention Reasoning via Structured Search and Backtracking [49.427842994857635]
We introduce Visual Attention Reasoning, a framework that recasts grounded reasoning as a structured search.<n> VAR decomposes the reasoning process into two key stages: traceable evidence grounding and search-based chain-of-thought.<n>We show that our 7B model, VAR-7B, sets a new state-of-the-art on a comprehensive suite of hallucination and safety benchmarks.
arXiv Detail & Related papers (2025-10-21T13:18:44Z) - ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping [54.37497695483689]
We propose ARES, a unified framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty.<n>Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens can reliably capture reasoning-critical moments.<n>In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness.<n>In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers
arXiv Detail & Related papers (2025-10-09T17:03:28Z) - RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection [55.125987985864896]
We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors.<n>We propose a simple yet effective approach to instantiate a search agent, RE-Searcher.<n>This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments.
arXiv Detail & Related papers (2025-09-30T10:25:27Z) - DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning [5.280613615397194]
DynaSearcher is an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL)<n>We employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality.<n> Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets.
arXiv Detail & Related papers (2025-07-23T09:58:31Z) - Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning [35.35813310224967]
Large language models have demonstrated impressive reasoning capabilities but are inherently limited by their knowledge reservoir.<n>Retrieval-augmented reasoning mitigates this limitation by allowing LLMs to query external resources.<n>We propose AutoRefine, a reinforcement learning framework that adopts a new "search-and-refine-during-think" paradigm.
arXiv Detail & Related papers (2025-05-16T14:11:29Z) - Knowledge-Aware Iterative Retrieval for Multi-Agent Systems [0.0]
We introduce a novel large language model (LLM)-driven agent framework.<n>It iteratively refines queries and filters contextual evidence by leveraging dynamically evolving knowledge.<n>The proposed system supports both competitive and collaborative sharing of updated context.
arXiv Detail & Related papers (2025-03-17T15:27:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.