Related papers: Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

URL: http://arxiv.org/abs/2602.12544v1
Date: Fri, 13 Feb 2026 02:52:18 GMT
Title: Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
Authors: Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn, Creighton Glasscock, Honglak Lee,
Abstract summary: We present a scalable pipeline for automatically generating high-quality training data for web agents.<n>We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion.
Score: 54.945281159783896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made towards task completion. We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion. This enables us to leverage partially successful trajectories, which significantly expands the amount of usable training data. We evaluate our method on a new benchmark we propose called BookingArena, which consists of complex booking tasks across 20 popular websites, and demonstrate that our distilled student model outperforms open-source approaches and matches or exceeds commercial systems, while being a significantly smaller model. Our work addresses the challenge of efficiently creating diverse, realistic web interaction datasets and provides a systematic evaluation methodology for complex structured web tasks.

Related papers

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents [20.85611634311147]
We introduce WebFactory, a novel, fully automated closed-loop reinforcement learning pipeline for GUI agents.<n>Our agent demonstrates exceptional data efficiency and generalization.<n>This work presents a scalable and cost-effective paradigm for transforming passive internet knowledge into active, grounded intelligence.
arXiv Detail & Related papers (2026-03-05T10:51:34Z)
EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence [17.644658293987955]
Embodied AI agents are capable of robust spatial perception, effective task planning, and adaptive execution in physical environments.<n>Current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations.<n>We propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes.
arXiv Detail & Related papers (2025-10-23T14:05:55Z)
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms [81.90219895125178]
Web-based 'deep research' agents aim to solve complex question - answering tasks through long-horizon interactions with online tools.<n>These tasks remain challenging, as the underlying language models are often not optimized for long-horizon reasoning.<n>We introduce a two-pronged data synthesis pipeline that generates question - answer pairs by progressively increasing complexity.
arXiv Detail & Related papers (2025-10-15T06:34:46Z)
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning [73.91893534088798]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all open-source agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-09-16T17:57:03Z)
WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z)
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
Existing approaches rely on expensive human annotation, making them unsustainable at scale.<n>We propose AgentTrek, a scalable data synthesis pipeline that generates web agent trajectories by leveraging publicly available tutorials.<n>Our fully automated approach significantly reduces data collection costs, achieving a cost of just $0.55 per high-quality trajectory without human annotators.
arXiv Detail & Related papers (2024-12-12T18:59:27Z)
Large Language Models Can Self-Improve At Web Agent Tasks [37.17001438055515]
Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion. We explore the extent to which LLMs can self-improve their performance as agents in long-horizon tasks in a complex environment using the WebArena benchmark. We achieve a 31% improvement in task completion rate over the base model on the WebArena benchmark through a self-improvement procedure.
arXiv Detail & Related papers (2024-05-30T17:52:36Z)
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)
Automated Robustness with Adversarial Training as a Post-Processing Step [5.55549775099824]
This work explores the efficacy of a simple post processing step in yielding robust deep learning model. We adopt adversarial training as a post-processing step for optimised network architectures obtained from a neural architecture search algorithm.
arXiv Detail & Related papers (2021-09-06T15:17:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.