Related papers: Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web

Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web

URL: http://arxiv.org/abs/2602.17245v1
Date: Thu, 19 Feb 2026 10:50:52 GMT
Title: Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web
Authors: Linxi Jiang, Rui Xi, Zhijie Liu, Shuo Chen, Zhiqiang Lin, Suman Nath,
Abstract summary: Current web agents operate on low-level primitives such as clicks and keystrokes.<n>We argue that the agentic web also requires a semantic layer for web actions.<n>We propose textbfWeb Verbs, a web-scale set of typed, semantically documented functions.
Score: 17.537750923987762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Web is evolving from a medium that humans browse to an environment where software agents act on behalf of users. Advances in large language models (LLMs) make natural language a practical interface for goal-directed tasks, yet most current web agents operate on low-level primitives such as clicks and keystrokes. These operations are brittle, inefficient, and difficult to verify. Complementing content-oriented efforts such as NLWeb's semantic layer for retrieval, we argue that the agentic web also requires a semantic layer for web actions. We propose \textbf{Web Verbs}, a web-scale set of typed, semantically documented functions that expose site capabilities through a uniform interface, whether implemented through APIs or robust client-side workflows. These verbs serve as stable and composable units that agents can discover, select, and synthesize into concise programs. This abstraction unifies API-based and browser-based paradigms, enabling LLMs to synthesize reliable and auditable workflows with explicit control and data flow. Verbs can carry preconditions, postconditions, policy tags, and logging support, which improves \textbf{reliability} by providing stable interfaces, \textbf{efficiency} by reducing dozens of steps into a few function calls, and \textbf{verifiability} through typed contracts and checkable traces. We present our vision, a proof-of-concept implementation, and representative case studies that demonstrate concise and robust execution compared to existing agents. Finally, we outline a roadmap for standardization to make verbs deployable and trustworthy at web scale.

Related papers

EmbeWebAgent: Embedding Web Agents into Any Customized UI [3.034887612600091]
We present EmbeWebAgent, a framework for embedding agents directly into existing UIs.<n>It supports mixed-granularity actions ranging from primitives to higher-level composites.<n>Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting.
arXiv Detail & Related papers (2026-02-16T15:59:56Z)
Nested Browser-Use Learning for Agentic Information Seeking [60.775556172513014]
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching.<n>We propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure.
arXiv Detail & Related papers (2025-12-29T17:59:14Z)
Building the Web for Agents: A Declarative Framework for Agent-Web Interaction [0.7116403133334644]
We introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents.<n> VOIX introduces tool> and context> tags, allowing developers to explicitly define available actions and relevant state.<n>We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers.
arXiv Detail & Related papers (2025-11-14T13:23:34Z)
Affordance Representation and Recognition for Autonomous Agents [64.39018305018904]
This paper introduces a pattern language for world modeling from structured data.<n>The DOM Transduction Pattern addresses the challenge of web page complexity.<n>The Hypermedia Affordances Recognition Pattern enables the agent to dynamically enrich its world model.
arXiv Detail & Related papers (2025-10-28T14:27:28Z)
WALT: Web Agents that Learn Tools [66.73502484310121]
WALT is a framework that reverse-engineers latent website functionality into reusable invocable tools.<n>Rather than hypothesizing ad-hoc skills, WALT exposes robust implementations of automations already designed into websites.<n>On VisualWebArena and WebArena, WALT achieves higher success with fewer steps and less LLM-dependent reasoning.
arXiv Detail & Related papers (2025-10-01T23:41:47Z)
Less is More: Empowering GUI Agent with Context-Aware Simplification [62.02157661751793]
We propose a context-aware framework for building an efficient and effective GUI Agent, termed SimpAgent.<n>With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances.
arXiv Detail & Related papers (2025-07-04T17:37:15Z)
Beyond Syntax: Action Semantics Learning for App Agents [60.56331102288794]
Action Semantics Learning (ASL) is a learning framework where the learning objective is capturing the semantics of the ground truth actions.<n>ASL significantly improves the accuracy and generalisation of App agents over existing methods.
arXiv Detail & Related papers (2025-06-21T12:08:19Z)
WebNav: An Intelligent Agent for Voice-Controlled Web Navigation [0.0]
WebNav is a novel agent for multi-modal web navigation.<n>System combines vision-based context from screenshots with a dynamic DOM-labeling browser extension.
arXiv Detail & Related papers (2025-03-18T02:33:27Z)
PAFFA: Premeditated Actions For Fast Agents [19.576180667174366]
We introduce PAFFA, a method that makes LLMs faster and more accurate in completing tasks on the internet using a novel inference-time technique.<n>PAFFA drastically reduces inference time tokens by 87% while maintaining robust performance.<n>Unravel's ability to update its action library based on explorations allows generalization and adaptation to unseen websites.
arXiv Detail & Related papers (2024-12-10T22:51:31Z)
DynaSaur: Large Language Agents Beyond Predefined Actions [126.98162266986554]
Existing LLM agent systems typically select actions from a fixed and predefined set at every step.<n>We propose an LLM agent framework that can dynamically create and compose actions as needed.<n>In this framework, the agent interacts with its environment by generating and executing programs written in a general-purpose programming language.
arXiv Detail & Related papers (2024-11-04T02:08:59Z)
AppAgent v2: Advanced Agent for Flexible Mobile Interactions [57.98933460388985]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.<n>Our agent constructs a flexible action space that enhances adaptability across various applications.<n>Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z)
Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions [1.845645938093348]
We introduce Frontend Diffusion, an end-to-end tool that generates high-quality websites from user sketches. We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks.
arXiv Detail & Related papers (2024-07-16T20:24:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.