Related papers: WebDancer: Towards Autonomous Information Seeking Agency

WebDancer: Towards Autonomous Information Seeking Agency

URL: http://arxiv.org/abs/2505.22648v2
Date: Sun, 29 Jun 2025 17:33:01 GMT
Title: WebDancer: Towards Autonomous Information Seeking Agency
Authors: Jialong Wu, Baixuan Li, Runnan Fang, Wenbiao Yin, Liwen Zhang, Zhengwei Tao, Dingchu Zhang, Zekun Xi, Gang Fu, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou,
Abstract summary: Recent progress in agentic systems underscores the potential for autonomous multi-step research.<n>We present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective.<n>We instantiate this framework in a web agent based on the ReAct, WebDancer.
Score: 69.33360019344083
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In this work, we present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective. Our approach consists of four key stages: (1) browsing data construction, (2) trajectories sampling, (3) supervised fine-tuning for effective cold start, and (4) reinforcement learning for enhanced generalisation. We instantiate this framework in a web agent based on the ReAct, WebDancer. Empirical evaluations on the challenging information seeking benchmarks, GAIA and WebWalkerQA, demonstrate the strong performance of WebDancer, achieving considerable results and highlighting the efficacy of our training paradigm. Further analysis of agent training provides valuable insights and actionable, systematic pathways for developing more capable agentic models. The codes and demo will be released in https://github.com/Alibaba-NLP/WebAgent.

Related papers

Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents [70.77400371166922]
Deep research web agents need to rigorously analyze and aggregate knowledge for insightful research.<n>We propose an Explore to Evolve paradigm to scalably construct verifiable training data for web agents.<n>Based on an open-source agent framework, SmolAgents, we collect supervised fine-tuning trajectories to develop a series of foundation models.
arXiv Detail & Related papers (2025-10-16T08:37:42Z)
Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them [23.986035712600657]
We propose a reasoning-driven pipeline to study effective reasoning behavior patterns in agentic search.<n>We identify four beneficial reasoning behaviors: Information Verification, Authority Evaluation, Adaptive Search, and Error Recovery.<n>We show that behavior priming yields over 35% gains in Llama3.2-3B and Qwen3-1.7B compared to directly training agentic search models with RL.
arXiv Detail & Related papers (2025-10-08T00:20:35Z)
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents [72.28593628378991]
WebResearcher is an iterative deep-research paradigm that reformulates deep research as a Markov Decision Process.<n>WebResearcher achieves state-of-the-art performance, even surpassing frontier proprietary systems.
arXiv Detail & Related papers (2025-09-16T17:57:17Z)
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z)
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [67.895981259683]
General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence.<n>Current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools.<n>We present Cognitive Kernel-Pro, a fully open-source and (to the maximum extent) free multi-module agent framework.
arXiv Detail & Related papers (2025-08-01T08:11:31Z)
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge [34.672897171399775]
Agentic search systems autonomously browse the web, synthesize information, and return comprehensive citation-backed answers.<n>Mind2Web 2 is a benchmark of 130 realistic, high-quality, and long-horizon tasks constructed with over 1000 hours of human labor.<n>Our method constructs task-specific judge agents based on a tree-structured design to automatically assess both answer correctness and source attribution.
arXiv Detail & Related papers (2025-06-26T17:32:50Z)
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z)
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning [37.89715280583421]
WebAgent-R1 is an end-to-end multi-turn reinforcement learning framework for training web agents.<n>Experiments on the WebArena-Lite benchmark demonstrate the effectiveness of WebAgent-R1, boosting the task success rate of Qwen-2.5-3B from 6.1% to 33.9%.<n>In-depth analyses reveal the effectiveness of the thinking-based prompting strategy and test-time scaling through increased interactions for web tasks.
arXiv Detail & Related papers (2025-05-22T09:07:43Z)
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [20.498100965239818]
We introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents.<n>Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web.<n>Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines.
arXiv Detail & Related papers (2025-04-04T04:41:28Z)
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents [64.75036903373712]
Proposer-Agent-Evaluator is a learning system that enables foundation model agents to autonomously discover and practice skills in the wild.<n>At the heart of PAE is a context-aware task proposer that autonomously proposes tasks for the agent to practice with context information.<n>The success evaluation serves as the reward signal for the agent to refine its policies through RL.
arXiv Detail & Related papers (2024-12-17T18:59:50Z)
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
Existing approaches rely on expensive human annotation, making them unsustainable at scale.<n>We propose AgentTrek, a scalable data synthesis pipeline that generates web agent trajectories by leveraging publicly available tutorials.<n>Our fully automated approach significantly reduces data collection costs, achieving a cost of just $0.55 per high-quality trajectory without human annotators.
arXiv Detail & Related papers (2024-12-12T18:59:27Z)
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning [78.42927884000673]
ExACT is an approach to combine test-time search and self-learning to build o1-like models for agentic applications.<n>We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly.<n>Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms.
arXiv Detail & Related papers (2024-10-02T21:42:35Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.