Related papers: WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model

URL: http://arxiv.org/abs/2504.21024v1
Date: Wed, 23 Apr 2025 02:54:31 GMT
Title: WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
Authors: Tianqing Fang, Hongming Zhang, Zhisong Zhang, Kaixin Ma, Wenhao Yu, Haitao Mi, Dong Yu,
Abstract summary: Self-evolving agents are trained on trajectories sampled autonomously based on their own policies.<n>We propose a novel framework that introduces a co-evolving World Model LLM.<n>This world model predicts the next observation based on the current observation and action within the web environment.
Score: 55.276852838877346
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agent self-improvement, where the backbone Large Language Model (LLM) of the agent are trained on trajectories sampled autonomously based on their own policies, has emerged as a promising approach for enhancing performance. Recent advancements, particularly in web environments, face a critical limitation: their performance will reach a stagnation point during autonomous learning cycles, hindering further improvement. We argue that this stems from limited exploration of the web environment and insufficient exploitation of pre-trained web knowledge in LLMs. To improve the performance of self-improvement, we propose a novel framework that introduces a co-evolving World Model LLM. This world model predicts the next observation based on the current observation and action within the web environment. Leveraging LLMs' pretrained knowledge of abundant web content, the World Model serves dual roles: (1) as a virtual web server generating self-instructed training data to continuously refine the agent's policy, and (2) as an imagination engine during inference, enabling look-ahead simulation to guide action selection for the agent LLM. Experiments in real-world web environments (Mind2Web-Live, WebVoyager, and GAIA-web) show a 10% performance gain over existing self-evolving agents, demonstrating the efficacy and generalizability of our approach, without using any distillation from more powerful close-sourced models. Our work establishes the necessity of integrating world models into autonomous agent frameworks to unlock sustained adaptability.

Related papers

SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model [88.04128601981145]
We introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning.<n>modelname overcomes the limitations of autoregressive reasoning by introducing a world model for planning via simulation.<n>World-model-based planning, in particular, shows consistent advantage of up to 124% over autoregressive planning.
arXiv Detail & Related papers (2025-07-31T17:57:20Z)
WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis [34.998277998052444]
We propose WebSynthesis, a novel framework for trajectory synthesis and training.<n>We show that an agent trained using WebSynthesis on a small-scale synthetic dataset achieves performance comparable to or even surpassing that of models trained on large-scale real-world data.
arXiv Detail & Related papers (2025-07-06T12:31:10Z)
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [74.82886755416949]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z)
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning [93.58897637077001]
This paper tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints.<n>We pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos.<n>For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model.
arXiv Detail & Related papers (2025-03-11T13:50:22Z)
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents [22.608219492706876]
We propose a model-based planning framework for web agents that employs a world model to simulate and deliberate over the outcome of each candidate action before committing to one.<n> Empirical results demonstrate that WebDreamer achieves substantial performance improvements over reactive baselines.<n>Our trained world model, Dreamer-7B, performs comparable to GPT-4o, highlighting the potential of specialized world models for efficient and effective planning in complex web environments.
arXiv Detail & Related papers (2024-11-10T18:50:51Z)
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization [66.22117723598872]
We introduce an open-source framework designed to facilitate the development of multimodal web agent. We first train the base model with imitation learning to gain the basic abilities. We then let the agent explore the open web and collect feedback on its trajectories.
arXiv Detail & Related papers (2024-10-25T15:01:27Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation [25.26545170310844]
We present a World-model-augmented (WMA) web agent, which simulates the outcomes of its actions for better decision-making.<n>Experiments on WebArena and Mind2Web show that our world models improve agents' policy selection without training.
arXiv Detail & Related papers (2024-10-17T05:37:00Z)
Large Language Models Can Self-Improve At Web Agent Tasks [37.17001438055515]
Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion. We explore the extent to which LLMs can self-improve their performance as agents in long-horizon tasks in a complex environment using the WebArena benchmark. We achieve a 31% improvement in task completion rate over the base model on the WebArena benchmark through a self-improvement procedure.
arXiv Detail & Related papers (2024-05-30T17:52:36Z)
SELF: Self-Evolution with Language Feedback [68.6673019284853]
'SELF' (Self-Evolution with Language Feedback) is a novel approach to advance large language models. It enables LLMs to self-improve through self-reflection, akin to human learning processes. Our experiments in mathematics and general tasks demonstrate that SELF can enhance the capabilities of LLMs without human intervention.
arXiv Detail & Related papers (2023-10-01T00:52:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.