WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation
- URL: http://arxiv.org/abs/2510.19205v1
- Date: Wed, 22 Oct 2025 03:29:25 GMT
- Title: WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation
- Authors: Yaoyao Qian, Yuanli Wang, Jinda Zhang, Yun Zong, Meixu Chen, Hanhan Zhou, Jindan Huang, Yifan Zeng, Xinyu Hu, Chan Hee Song, Danqing Zhang,
- Abstract summary: We present WebGraphEval, a framework that abstracts trajectories from multiple agents into a unified, weighted action graph.<n>We show that WebGraphEval captures cross-model regularities, highlights redundancy and inefficiency, and identifies critical decision points overlooked by outcome-based metrics.
- Score: 13.14840279219976
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current evaluation of web agents largely reduces to binary success metrics or conformity to a single reference trajectory, ignoring the structural diversity present in benchmark datasets. We present WebGraphEval, a framework that abstracts trajectories from multiple agents into a unified, weighted action graph. This representation is directly compatible with benchmarks such as WebArena, leveraging leaderboard runs and newly collected trajectories without modifying environments. The framework canonically encodes actions, merges recurring behaviors, and applies structural analyses including reward propagation and success-weighted edge statistics. Evaluations across thousands of trajectories from six web agents show that the graph abstraction captures cross-model regularities, highlights redundancy and inefficiency, and identifies critical decision points overlooked by outcome-based metrics. By framing web interaction as graph-structured data, WebGraphEval establishes a general methodology for multi-path, cross-agent, and efficiency-aware evaluation of web agents.
Related papers
- ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning [26.725654222717335]
We introduce ChartAgent, a chart understanding framework grounded in Tool-Integrated Reasoning.<n>Inspired by human cognition, ChartAgent decomposes complex chart analysis into a sequence of observable, replayable steps.<n>We show that ChartAgent substantially improves under sparse annotation settings.
arXiv Detail & Related papers (2025-12-16T03:17:04Z) - Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning [0.8890833546984916]
This paper presents a novel approach to neural network pruning by integrating a graph-based observation space into an AutoML framework.<n>Our framework transforms the pruning process by introducing a graph representation of the target neural network.<n>For the action space we transition from continuous pruning ratios to fine-grained binary action spaces.
arXiv Detail & Related papers (2025-09-04T15:05:05Z) - WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks [7.4706262500758385]
We introduce WebArXiv, a benchmark for evaluating autonomous web agents.<n>WebArXiv consists of 275 web-based tasks grounded in the arXiv platform.<n>We propose a lightweight dynamic reflection mechanism that allows agents to selectively retrieve relevant past steps.
arXiv Detail & Related papers (2025-07-01T16:43:57Z) - Rethinking Link Prediction for Directed Graphs [73.36395969796804]
Link prediction for directed graphs is a crucial task with diverse real-world applications.<n>Recent advances in embedding methods and Graph Neural Networks (GNNs) have shown promising improvements.<n>We propose a unified framework to assess the expressiveness of existing methods, highlighting the impact of dual embeddings and decoder design on directed link prediction performance.
arXiv Detail & Related papers (2025-02-08T23:51:05Z) - Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - Representing Web Applications As Knowledge Graphs [0.0]
The proposed method models each node as a structured representation of the application's current state, with edges reflecting user-initiated actions or transitions.
This structured representation enables a more comprehensive and functional understanding of web applications, offering valuable insights for downstream tasks such as automated testing and behavior analysis.
arXiv Detail & Related papers (2024-10-06T02:50:41Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE.
AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution.
Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z) - Mutually exciting point process graphs for modelling dynamic networks [0.0]
A new class of models for dynamic networks is proposed, called mutually exciting point process graphs (MEG)
MEG is a scalable network-wide statistical model for point processes with dyadic marks, which can be used for anomaly detection.
The model is tested on simulated graphs and real world computer network datasets, demonstrating excellent performance.
arXiv Detail & Related papers (2021-02-11T10:14:55Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.