WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback
- URL: http://arxiv.org/abs/2505.20013v1
- Date: Mon, 26 May 2025 14:03:37 GMT
- Title: WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback
- Authors: Minda Hu, Tianqing Fang, Jianshu Zhang, Junyu Ma, Zhisong Zhang, Jingyan Zhou, Hongming Zhang, Haitao Mi, Dong Yu, Irwin King,
- Abstract summary: We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
- Score: 74.82886755416949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Web agents powered by Large Language Models (LLMs) show promise for next-generation AI, but their limited reasoning in uncertain, dynamic web environments hinders robust deployment. In this paper, we identify key reasoning skills essential for effective web agents, i.e., reflection & lookahead, branching, and rollback, and curate trajectory data that exemplifies these abilities by reconstructing the agent's (inference-time) reasoning algorithms into chain-of-thought rationales. We conduct experiments in the agent self-improving benchmark, OpenWebVoyager, and demonstrate that distilling salient reasoning patterns into the backbone LLM via simple fine-tuning can substantially enhance its performance. Our approach yields significant improvements across multiple benchmarks, including WebVoyager, Mind2web-live, and SimpleQA (web search), highlighting the potential of targeted reasoning skill enhancement for web agents.
Related papers
- WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks [27.091938524991534]
We introduce WebArXiv, a benchmark for evaluating autonomous web agents.<n>WebArXiv consists of 275 web-based tasks grounded in the arXiv platform.<n>We propose a lightweight dynamic reflection mechanism that allows agents to selectively retrieve relevant past steps.
arXiv Detail & Related papers (2025-07-01T16:43:57Z) - WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model [55.276852838877346]
Self-evolving agents are trained on trajectories sampled autonomously based on their own policies.<n>We propose a novel framework that introduces a co-evolving World Model LLM.<n>This world model predicts the next observation based on the current observation and action within the web environment.
arXiv Detail & Related papers (2025-04-23T02:54:31Z) - Enhancing Web Agents with Explicit Rollback Mechanisms [55.276852838877346]
We enhance web agents with an explicit rollback mechanism, enabling the agent to revert back to a previous state in its navigation trajectory.<n>This mechanism gives the model the flexibility to directly control the search process, leading to an effective and efficient web navigation method.
arXiv Detail & Related papers (2025-04-16T05:41:20Z) - Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis [35.57217841344101]
This study investigates the underlying factors that contribute to the increased vulnerability of Web AI agents.<n>We identify three critical factors that amplify the vulnerability of Web AI agents; (1) embedding user goals into the system prompt, (2) multi-step action generation, and (3) observational capabilities.
arXiv Detail & Related papers (2025-02-27T18:56:26Z) - Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools [19.70178343422698]
We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents.<n>Key innovation in our framework is the Mind-Map agent, which constructs a structured knowledge graph to store reasoning context.<n>When deployed on DeepSeek-R1, our method achieves a new state-of-the-art (SOTA) among public models.
arXiv Detail & Related papers (2025-02-07T04:08:46Z) - R2D2: Remembering, Reflecting and Dynamic Decision Making for Web Agents [53.94879482534949]
Current models often struggle with efficient navigation and action execution due to limited visibility and understanding of web structures.<n>Our proposed R2D2 framework addresses these challenges by integrating two paradigms: Remember and Reflect.<n>Our findings suggest that a combination of memory-enhanced navigation and reflective learning promisingly advances the capabilities of web agents.
arXiv Detail & Related papers (2025-01-21T20:21:58Z) - OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization [66.22117723598872]
We introduce an open-source framework designed to facilitate the development of multimodal web agent.
We first train the base model with imitation learning to gain the basic abilities.
We then let the agent explore the open web and collect feedback on its trajectories.
arXiv Detail & Related papers (2024-10-25T15:01:27Z) - AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space.<n>AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z) - Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [117.94654815220404]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.<n>G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.