Related papers: WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection

WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection

URL: http://arxiv.org/abs/2510.18798v1
Date: Tue, 21 Oct 2025 16:52:00 GMT
Title: WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
Authors: Guanzhong He, Zhen Yang, Jinxin Liu, Bin Xu, Lei Hou, Juanzi Li,
Abstract summary: We present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism.<n>Our approach substantially extends tool-use chains and improves answer accuracy.
Score: 51.10348385624784
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Search agents have achieved significant advancements in enabling intelligent information retrieval and decision-making within interactive environments. Although reinforcement learning has been employed to train agentic models capable of more dynamic interactive retrieval, existing methods are limited by shallow tool-use depth and the accumulation of errors over multiple iterative interactions. In this paper, we present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism. Specifically, we construct a large dataset annotated with reflection patterns and design a two-stage training framework that unifies cold start and reinforcement learning within the self-reflection paradigm for real-world web-based environments, which enables the model to generate longer and more reflective tool-use trajectories. Our approach substantially extends tool-use chains and improves answer accuracy. Using a single 14B model, we achieve state-of-the-art results on HotpotQA and SimpleQA, with accuracies of 72.3% and 90.0%, respectively, and demonstrate strong generalization to out-of-distribution datasets. The code is available at https://github.com/99hgz/WebSeer

Related papers

Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search [70.63903518295785]
We introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search.<n>Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks.
arXiv Detail & Related papers (2025-10-30T09:10:36Z)
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window [88.85901839023803]
DeepMiner is a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window.<n>We develop DeepMiner-32B, which achieves substantial performance improvements across multiple search agent benchmarks.
arXiv Detail & Related papers (2025-10-09T14:31:39Z)
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning [4.456860697635325]
Training Web Agents with reinforcement learning faces critical challenges including credit assignment misallocation, prohibitively high annotation costs, and reward sparsity.<n>Our framework incorporates a Process Reward Model that automatically generates fine-grained rewards through subgoal progress, redundancy detection, and action verification.<n>Experiments on Online-Mind2Web and our self-constructed C-WebShop datasets demonstrate that TGPO significantly outperforms existing methods.
arXiv Detail & Related papers (2025-09-17T16:58:44Z)
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z)
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability [106.35604230971396]
Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning.<n>To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch.<n>In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans.<n>After that, the model is trained on downstream tasks to achieve further improvement.
arXiv Detail & Related papers (2025-05-26T17:58:50Z)
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [78.55946306325914]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z)
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging [11.377241012645994]
InForage is a reinforcement learning framework that formalizes retrieval-augmented reasoning as a dynamic information-seeking process.<n>We construct a human-guided dataset capturing iterative search and reasoning trajectories for complex, real-world web tasks.<n>These results highlight InForage's effectiveness in building robust, adaptive, and efficient reasoning agents.
arXiv Detail & Related papers (2025-05-14T12:13:38Z)
Automation and Feature Selection Enhancement with Reinforcement Learning (RL) [0.0]
Reinforcement learning integrated with decision tree improves feature knowledge, state representation and selection efficiency.<n>Monte Carlo-based reinforced feature selection(MCRFS), a single-agent feature selection method reduces computational burden.<n>A dual-agent RL framework is also introduced that collectively selects features and instances, capturing the interactions between them.
arXiv Detail & Related papers (2025-03-15T04:30:55Z)
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training [18.896813839389893]
We propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly.<n>Unlike traditional methods that reward or penalize actions based on correctness, Agent-R leverages MCTS to construct training data that recover correct trajectories from erroneous ones.<n>Our findings demonstrate that Agent-R continuously improves the model's ability to recover from errors and enables timely error correction.
arXiv Detail & Related papers (2025-01-20T11:46:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.