Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web
- URL: http://arxiv.org/abs/2602.17245v1
- Date: Thu, 19 Feb 2026 10:50:52 GMT
- Title: Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web
- Authors: Linxi Jiang, Rui Xi, Zhijie Liu, Shuo Chen, Zhiqiang Lin, Suman Nath,
- Abstract summary: Current web agents operate on low-level primitives such as clicks and keystrokes.<n>We argue that the agentic web also requires a semantic layer for web actions.<n>We propose textbfWeb Verbs, a web-scale set of typed, semantically documented functions.
- Score: 17.537750923987762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Web is evolving from a medium that humans browse to an environment where software agents act on behalf of users. Advances in large language models (LLMs) make natural language a practical interface for goal-directed tasks, yet most current web agents operate on low-level primitives such as clicks and keystrokes. These operations are brittle, inefficient, and difficult to verify. Complementing content-oriented efforts such as NLWeb's semantic layer for retrieval, we argue that the agentic web also requires a semantic layer for web actions. We propose \textbf{Web Verbs}, a web-scale set of typed, semantically documented functions that expose site capabilities through a uniform interface, whether implemented through APIs or robust client-side workflows. These verbs serve as stable and composable units that agents can discover, select, and synthesize into concise programs. This abstraction unifies API-based and browser-based paradigms, enabling LLMs to synthesize reliable and auditable workflows with explicit control and data flow. Verbs can carry preconditions, postconditions, policy tags, and logging support, which improves \textbf{reliability} by providing stable interfaces, \textbf{efficiency} by reducing dozens of steps into a few function calls, and \textbf{verifiability} through typed contracts and checkable traces. We present our vision, a proof-of-concept implementation, and representative case studies that demonstrate concise and robust execution compared to existing agents. Finally, we outline a roadmap for standardization to make verbs deployable and trustworthy at web scale.
Related papers
- EmbeWebAgent: Embedding Web Agents into Any Customized UI [3.034887612600091]
We present EmbeWebAgent, a framework for embedding agents directly into existing UIs.<n>It supports mixed-granularity actions ranging from primitives to higher-level composites.<n>Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting.
arXiv Detail & Related papers (2026-02-16T15:59:56Z) - Nested Browser-Use Learning for Agentic Information Seeking [60.775556172513014]
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching.<n>We propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure.
arXiv Detail & Related papers (2025-12-29T17:59:14Z) - Building the Web for Agents: A Declarative Framework for Agent-Web Interaction [0.7116403133334644]
We introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents.<n> VOIX introduces tool> and context> tags, allowing developers to explicitly define available actions and relevant state.<n>We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers.
arXiv Detail & Related papers (2025-11-14T13:23:34Z) - Affordance Representation and Recognition for Autonomous Agents [64.39018305018904]
This paper introduces a pattern language for world modeling from structured data.<n>The DOM Transduction Pattern addresses the challenge of web page complexity.<n>The Hypermedia Affordances Recognition Pattern enables the agent to dynamically enrich its world model.
arXiv Detail & Related papers (2025-10-28T14:27:28Z) - WALT: Web Agents that Learn Tools [66.73502484310121]
WALT is a framework that reverse-engineers latent website functionality into reusable invocable tools.<n>Rather than hypothesizing ad-hoc skills, WALT exposes robust implementations of automations already designed into websites.<n>On VisualWebArena and WebArena, WALT achieves higher success with fewer steps and less LLM-dependent reasoning.
arXiv Detail & Related papers (2025-10-01T23:41:47Z) - Less is More: Empowering GUI Agent with Context-Aware Simplification [62.02157661751793]
We propose a context-aware framework for building an efficient and effective GUI Agent, termed SimpAgent.<n>With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances.
arXiv Detail & Related papers (2025-07-04T17:37:15Z) - Beyond Syntax: Action Semantics Learning for App Agents [60.56331102288794]
Action Semantics Learning (ASL) is a learning framework where the learning objective is capturing the semantics of the ground truth actions.<n>ASL significantly improves the accuracy and generalisation of App agents over existing methods.
arXiv Detail & Related papers (2025-06-21T12:08:19Z) - WebNav: An Intelligent Agent for Voice-Controlled Web Navigation [0.0]
WebNav is a novel agent for multi-modal web navigation.<n>System combines vision-based context from screenshots with a dynamic DOM-labeling browser extension.
arXiv Detail & Related papers (2025-03-18T02:33:27Z) - PAFFA: Premeditated Actions For Fast Agents [19.576180667174366]
We introduce PAFFA, a method that makes LLMs faster and more accurate in completing tasks on the internet using a novel inference-time technique.<n>PAFFA drastically reduces inference time tokens by 87% while maintaining robust performance.<n>Unravel's ability to update its action library based on explorations allows generalization and adaptation to unseen websites.
arXiv Detail & Related papers (2024-12-10T22:51:31Z) - DynaSaur: Large Language Agents Beyond Predefined Actions [126.98162266986554]
Existing LLM agent systems typically select actions from a fixed and predefined set at every step.<n>We propose an LLM agent framework that can dynamically create and compose actions as needed.<n>In this framework, the agent interacts with its environment by generating and executing programs written in a general-purpose programming language.
arXiv Detail & Related papers (2024-11-04T02:08:59Z) - AppAgent v2: Advanced Agent for Flexible Mobile Interactions [57.98933460388985]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.<n>Our agent constructs a flexible action space that enhances adaptability across various applications.<n>Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z) - Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions [1.845645938093348]
We introduce Frontend Diffusion, an end-to-end tool that generates high-quality websites from user sketches.
We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks.
arXiv Detail & Related papers (2024-07-16T20:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.