WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
- URL: http://arxiv.org/abs/2509.06501v3
- Date: Fri, 26 Sep 2025 02:31:59 GMT
- Title: WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
- Authors: Junteng Liu, Yunji Li, Chi Zhang, Jingyang Li, Aili Chen, Ke Ji, Weiyu Cheng, Zijia Wu, Chengyu Du, Qidi Xu, Jiayuan Song, Zhengmao Zhu, Wenhu Chen, Pengyu Zhao, Junxian He,
- Abstract summary: We introduce WebExplorer: a systematic data generation approach using model-based exploration and iterative, long-to-short query evolution.<n>Our model supports 128K context length and up to 100 tool calling turns, enabling long-horizon problem solving.<n>As an 8B-sized model, WebExplorer-8B is able to effectively search over an average of 16 turns after RL training.
- Score: 57.203515352080295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paradigm of Large Language Models (LLMs) has increasingly shifted toward agentic applications, where web browsing capabilities are fundamental for retrieving information from diverse online sources. However, existing open-source web agents either demonstrate limited information-seeking abilities on complex tasks or lack transparent implementations. In this work, we identify that the key challenge lies in the scarcity of challenging data for information seeking. To address this limitation, we introduce WebExplorer: a systematic data generation approach using model-based exploration and iterative, long-to-short query evolution. This method creates challenging query-answer pairs that require multi-step reasoning and complex web navigation. By leveraging our curated high-quality dataset, we successfully develop advanced web agent WebExplorer-8B through supervised fine-tuning followed by reinforcement learning. Our model supports 128K context length and up to 100 tool calling turns, enabling long-horizon problem solving. Across diverse information-seeking benchmarks, WebExplorer-8B achieves the state-of-the-art performance at its scale. Notably, as an 8B-sized model, WebExplorer-8B is able to effectively search over an average of 16 turns after RL training, achieving higher accuracy than WebSailor-72B on BrowseComp-en/zh and attaining the best performance among models up to 100B parameters on WebWalkerQA and FRAMES. Beyond these information-seeking tasks, our model also achieves strong generalization on the HLE benchmark even though it is only trained on knowledge-intensive QA data. These results highlight our approach as a practical path toward long-horizon web agents.
Related papers
- WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning [73.91893534088798]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all open-source agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-09-16T17:57:03Z) - DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL [60.47878242100153]
We present DeepDive to advance deep search agents.<n>We propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs.<n>We apply end-to-end multi-turn reinforcement learning to enhance LLMs' long-horizon reasoning with deep search.
arXiv Detail & Related papers (2025-09-12T17:52:35Z) - WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent [68.3311163530321]
Web agents such as Deep Research have demonstrated cognitive abilities, capable of solving highly challenging information-seeking problems.<n>This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, logic, knowledge.<n>We introduce WebWatcher, a multi-modal Agent for Deep Research equipped with enhanced visual-language reasoning capabilities.
arXiv Detail & Related papers (2025-08-07T18:03:50Z) - WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z) - Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning [79.26661332815465]
Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering.<n>Existing methods rely on static prompting rules or training with Wikipedia-based corpora and retrieval environments.<n>We introduce WebPuzzle, the first dataset designed to foster information-seeking behavior in open-world internet environments.
arXiv Detail & Related papers (2025-05-30T08:15:39Z) - Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents [16.161877699225986]
We develop a scalable recipe to synthesize the largest and most diverse trajectory-level dataset to date.<n>This dataset contains over 94K successful multimodal web trajectories, spanning 49K unique URLs, 720K screenshots, and 33M web elements.<n>We demonstrate strong performance on both offline and online web agent benchmarks such as Mind2Web-Live, Multimodal-Mind2Web, and MiniWob++.
arXiv Detail & Related papers (2025-02-17T02:13:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.