WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
- URL: http://arxiv.org/abs/2510.18798v1
- Date: Tue, 21 Oct 2025 16:52:00 GMT
- Title: WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
- Authors: Guanzhong He, Zhen Yang, Jinxin Liu, Bin Xu, Lei Hou, Juanzi Li,
- Abstract summary: We present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism.<n>Our approach substantially extends tool-use chains and improves answer accuracy.
- Score: 51.10348385624784
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Search agents have achieved significant advancements in enabling intelligent information retrieval and decision-making within interactive environments. Although reinforcement learning has been employed to train agentic models capable of more dynamic interactive retrieval, existing methods are limited by shallow tool-use depth and the accumulation of errors over multiple iterative interactions. In this paper, we present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism. Specifically, we construct a large dataset annotated with reflection patterns and design a two-stage training framework that unifies cold start and reinforcement learning within the self-reflection paradigm for real-world web-based environments, which enables the model to generate longer and more reflective tool-use trajectories. Our approach substantially extends tool-use chains and improves answer accuracy. Using a single 14B model, we achieve state-of-the-art results on HotpotQA and SimpleQA, with accuracies of 72.3% and 90.0%, respectively, and demonstrate strong generalization to out-of-distribution datasets. The code is available at https://github.com/99hgz/WebSeer
Related papers
- Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search [70.63903518295785]
We introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search.<n>Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks.
arXiv Detail & Related papers (2025-10-30T09:10:36Z) - Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window [88.85901839023803]
DeepMiner is a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window.<n>We develop DeepMiner-32B, which achieves substantial performance improvements across multiple search agent benchmarks.
arXiv Detail & Related papers (2025-10-09T14:31:39Z) - TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning [4.456860697635325]
Training Web Agents with reinforcement learning faces critical challenges including credit assignment misallocation, prohibitively high annotation costs, and reward sparsity.<n>Our framework incorporates a Process Reward Model that automatically generates fine-grained rewards through subgoal progress, redundancy detection, and action verification.<n>Experiments on Online-Mind2Web and our self-constructed C-WebShop datasets demonstrate that TGPO significantly outperforms existing methods.
arXiv Detail & Related papers (2025-09-17T16:58:44Z) - From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z) - MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability [106.35604230971396]
Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning.<n>To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch.<n>In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans.<n>After that, the model is trained on downstream tasks to achieve further improvement.
arXiv Detail & Related papers (2025-05-26T17:58:50Z) - WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [78.55946306325914]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z) - Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging [11.377241012645994]
InForage is a reinforcement learning framework that formalizes retrieval-augmented reasoning as a dynamic information-seeking process.<n>We construct a human-guided dataset capturing iterative search and reasoning trajectories for complex, real-world web tasks.<n>These results highlight InForage's effectiveness in building robust, adaptive, and efficient reasoning agents.
arXiv Detail & Related papers (2025-05-14T12:13:38Z) - Automation and Feature Selection Enhancement with Reinforcement Learning (RL) [0.0]
Reinforcement learning integrated with decision tree improves feature knowledge, state representation and selection efficiency.<n>Monte Carlo-based reinforced feature selection(MCRFS), a single-agent feature selection method reduces computational burden.<n>A dual-agent RL framework is also introduced that collectively selects features and instances, capturing the interactions between them.
arXiv Detail & Related papers (2025-03-15T04:30:55Z) - Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training [18.896813839389893]
We propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly.<n>Unlike traditional methods that reward or penalize actions based on correctness, Agent-R leverages MCTS to construct training data that recover correct trajectories from erroneous ones.<n>Our findings demonstrate that Agent-R continuously improves the model's ability to recover from errors and enables timely error correction.
arXiv Detail & Related papers (2025-01-20T11:46:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.