WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance
- URL: http://arxiv.org/abs/2511.12997v1
- Date: Mon, 17 Nov 2025 05:38:50 GMT
- Title: WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance
- Authors: Genglin Liu, Shijie Geng, Sha Li, Hejie Cui, Sarah Zhang, Xin Liu, Tianyi Liu,
- Abstract summary: WebCoach is a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory.<n>WebCoach achieves self-evolution by continuously curating episodic memory from new navigation trajectories.<n> Evaluations on the WebVoyager benchmark demonstrate that WebCoach consistently improves the performance of browser-use agents.
- Score: 29.57207599604568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, current agents struggle with repetitive errors and lack the ability to learn from past experiences across sessions, limiting their long-term robustness and sample efficiency. We introduce WebCoach, a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory, enabling improved long-term planning, reflection, and continual learning without retraining. WebCoach consists of three key components: (1) a WebCondenser, which standardizes raw navigation logs into concise summaries; (2) an External Memory Store, which organizes complete trajectories as episodic experiences; and (3) a Coach, which retrieves relevant experiences based on similarity and recency, and decides whether to inject task-specific advice into the agent via runtime hooks. This design empowers web agents to access long-term memory beyond their native context window, improving robustness in complex browsing tasks. Moreover, WebCoach achieves self-evolution by continuously curating episodic memory from new navigation trajectories, enabling agents to improve over time without retraining. Evaluations on the WebVoyager benchmark demonstrate that WebCoach consistently improves the performance of browser-use agents across three different LLM backbones. With a 38B model, it increases task success rates from 47% to 61% while reducing or maintaining the average number of steps. Notably, smaller base models with WebCoach achieve performance comparable to the same web agent using GPT-4o.
Related papers
- TimeWarp: Evaluating Web Agents by Revisiting the Past [7.017865728670461]
We introduce TimeWarp, a benchmark that emulates the evolving web using containerized environments that vary in UI, design, and layout.<n>Our experiments reveal web agents' vulnerability to changes and the limitations of behavior cloning (BC) on single-version trajectories.<n>We propose TimeTraj, a simple yet effective algorithm that uses plan distillation to collect trajectories across multiple versions.
arXiv Detail & Related papers (2026-03-05T08:43:06Z) - See and Remember: A Multimodal Agent for Web Traversal [19.326814654711296]
V-GEMS is a robust multimodal agent architecture for web navigation.<n>Our agent integrates visual grounding to resolve ambiguous interactive elements and introduces an explicit memory stack with state tracking.<n> Experiments show V-GEMS significantly dominates the WebWalker baseline, achieving a substantial 28.7% performance gain.
arXiv Detail & Related papers (2026-03-03T05:55:05Z) - It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents [52.81924177620322]
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking.<n>Their reliance on dynamic web content makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task.<n>We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks.
arXiv Detail & Related papers (2025-12-29T01:09:10Z) - Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory [69.49061918994882]
Branch-and-Browse is a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution.<n>On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8% and reduces execution time by up to 40.4% relative to state-of-the-art methods.
arXiv Detail & Related papers (2025-10-18T00:45:37Z) - WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks [31.201406205897143]
We introduce WebChoreArena, a new fully reproducible benchmark comprising 532 carefully curated tasks.<n>WebChoreArena is built on top of the fully reproducible and widely adopted four WebArena simulation environments.<n>Our experimental results demonstrate that as LLMs evolve, significant improvements in performance are observed on WebChoreArena.
arXiv Detail & Related papers (2025-06-02T17:59:45Z) - WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [78.55946306325914]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z) - WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning [36.47273215142354]
WebAgent-R1 is an end-to-end multi-turn reinforcement learning framework for training web agents.<n>Experiments on the WebArena-Lite benchmark demonstrate the effectiveness of WebAgent-R1, boosting the task success rate of Qwen-2.5-3B from 6.1% to 33.9%.<n>In-depth analyses reveal the effectiveness of the thinking-based prompting strategy and test-time scaling through increased interactions for web tasks.
arXiv Detail & Related papers (2025-05-22T09:07:43Z) - WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms [52.942566473658054]
We enhance web agents with an explicit rollback mechanism, enabling the agent to revert back to a previous state in its navigation trajectory.<n>This mechanism gives the model the flexibility to directly control the search process, leading to an effective and efficient web navigation method.
arXiv Detail & Related papers (2025-04-16T05:41:20Z) - R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory [53.94879482534949]
Current models often struggle with efficient navigation and action execution due to limited visibility and understanding of web structures.<n>Our proposed R2D2 framework addresses these challenges by integrating two paradigms: Remember and Reflect.<n>Our findings suggest that a combination of memory-enhanced navigation and reflective learning promisingly advances the capabilities of web agents.
arXiv Detail & Related papers (2025-01-21T20:21:58Z) - Multimodal Web Navigation with Instruction-Finetuned Foundation Models [99.14209521903854]
We study data-driven offline training for web agents with vision-language foundation models.
We propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages.
We empirically demonstrate this recipe improves the agent's ability of grounded multimodal perception, HTML comprehension, and multi-step reasoning.
arXiv Detail & Related papers (2023-05-19T17:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.