WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
- URL: http://arxiv.org/abs/2603.05295v1
- Date: Thu, 05 Mar 2026 15:37:34 GMT
- Title: WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
- Authors: Sicheng Fan, Rui Wan, Yifei Leng, Gaoning Liang, Li Ling, Yanyi Shang, Dehan Kong,
- Abstract summary: WebChain is the largest open-source dataset of human-annotated trajectories on real-world websites.<n>Our work provides the data and insights necessary to build and rigorously evaluate the next generation of scalable web agents.
- Score: 5.150606279179606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world websites, designed to accelerate reproducible research in web agents. It contains 31,725 trajectories and 318k steps, featuring a core Triple Alignment of visual, structural, and action data to provide rich, multi-modal supervision. The data is collected via a scalable pipeline that ensures coverage of complex, high-value tasks often missed by synthetic methods. Leveraging this dataset, we propose a Dual Mid-Training recipe that decouples spatial grounding from planning, achieving state-of-the-art performance on our proposed WebChainBench and other public GUI benchmarks. Our work provides the data and insights necessary to build and rigorously evaluate the next generation of scalable web agents.
Related papers
- Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation [54.945281159783896]
We present a scalable pipeline for automatically generating high-quality training data for web agents.<n>We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion.
arXiv Detail & Related papers (2026-02-13T02:52:18Z) - CrediBench: Building Web-Scale Network Datasets for Information Integrity [27.562742270396086]
CrediBench is a large-scale data processing pipeline for constructing temporal web graphs.<n>Our approach captures the dynamic evolution of general misinformation domains.<n>From our experiments on this graph snapshot, we demonstrate the strength of both structural and webpage content signals for learning credibility scores.
arXiv Detail & Related papers (2025-09-27T14:42:48Z) - WebDS: An End-to-End Benchmark for Web-based Data Science [59.270670758607494]
WebDS is the first end-to-end web-based data science benchmark.<n>It comprises 870 web-based data science tasks across 29 diverse websites.<n>WebDS sets the stage for significant advances in the development of practically useful LLM-based data science.
arXiv Detail & Related papers (2025-08-02T06:39:59Z) - Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework [6.238251307666132]
This paper proposes a systematic framework for generating 3D bridge data.<n>It can automatically generate point clouds featuring component-level instance annotations, high-fidelity color, and precise normal vectors.<n> Experiments demonstrate that a PointNet++ model trained with our synthetic data achieves a mean Intersection over Union (mIoU) of 84.2% in real-world bridge semantic segmentation.
arXiv Detail & Related papers (2025-07-08T09:34:55Z) - AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
Existing approaches rely on expensive human annotation, making them unsustainable at scale.<n>We propose AgentTrek, a scalable data synthesis pipeline that generates web agent trajectories by leveraging publicly available tutorials.<n>Our fully automated approach significantly reduces data collection costs, achieving a cost of just $0.55 per high-quality trajectory without human annotators.
arXiv Detail & Related papers (2024-12-12T18:59:27Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - GROWN+UP: A Graph Representation Of a Webpage Network Utilizing
Pre-training [0.2538209532048866]
We introduce an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually.
We show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification.
arXiv Detail & Related papers (2022-08-03T13:37:27Z) - Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An
Approach [115.91099791629104]
We construct two new benchmark webly supervised fine-grained datasets, WebFG-496 and WebiNat-5089, respectively.
For WebiNat-5089, it contains 5089 sub-categories and more than 1.1 million web training images, which is the largest webly supervised fine-grained dataset ever.
As a minor contribution, we also propose a novel webly supervised method (termed Peer-learning'') for benchmarking these datasets.
arXiv Detail & Related papers (2021-08-05T06:28:32Z) - Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts.
We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation.
Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.