Related papers: Building the Web for Agents: A Declarative Framework for Agent-Web Interaction

Building the Web for Agents: A Declarative Framework for Agent-Web Interaction

URL: http://arxiv.org/abs/2511.11287v1
Date: Fri, 14 Nov 2025 13:23:34 GMT
Title: Building the Web for Agents: A Declarative Framework for Agent-Web Interaction
Authors: Sven Schultze, Meike Verena Kietzmann, Nils-Lucas Schönfeld, Ruth Stock-Homburg,
Abstract summary: We introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents.<n> VOIX introduces tool> and context> tags, allowing developers to explicitly define available actions and relevant state.<n>We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers.
Score: 0.7116403133334644
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing deployment of autonomous AI agents on the web is hampered by a fundamental misalignment: agents must infer affordances from human-oriented user interfaces, leading to brittle, inefficient, and insecure interactions. To address this, we introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents through simple, declarative HTML elements. VOIX introduces <tool> and <context> tags, allowing developers to explicitly define available actions and relevant state, thereby creating a clear, machine-readable contract for agent behavior. This approach shifts control to the website developer while preserving user privacy by disconnecting the conversational interactions from the website. We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers. The results demonstrate that participants, regardless of prior experience, were able to rapidly build diverse and functional agent-enabled web applications. Ultimately, this work provides a foundational mechanism for realizing the Agentic Web, enabling a future of seamless and secure human-AI collaboration on the web.

Related papers

Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts [59.68272935616536]
Avenir-Web is a web agent that achieves a new open-source state of the art on the Online-Mind2Web benchmark in real-world deployment.<n>We evaluate Avenir-Web on Online-Mind2Web, a rigorous benchmark of live and user-centered web tasks.
arXiv Detail & Related papers (2026-02-02T18:50:07Z)
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents [52.81924177620322]
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking.<n>Their reliance on dynamic web content makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task.<n>We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks.
arXiv Detail & Related papers (2025-12-29T01:09:10Z)
Permission Manifests for Web Agents [30.22217505383227]
The rise of Large Language Model (LLM)-based web agents represents a significant shift in automated interactions with the web.<n>Without a way to specify what interactions are and are not allowed, website owners increasingly rely on blanket blocking and CAPTCHAs.<n>We introduce agent-permissions, a robots.txt-style interfaces manifest where websites specify allowed interactions, complemented by API references.
arXiv Detail & Related papers (2025-12-07T17:45:01Z)
Agentic Web: Weaving the Next Web with AI Agents [109.13815627467514]
The emergence of AI agents powered by large language models (LLMs) marks a pivotal shift toward the Agentic Web.<n>In this paradigm, agents interact directly with one another to plan, coordinate, and execute complex tasks on behalf of users.<n>We present a structured framework for understanding and building the Agentic Web.
arXiv Detail & Related papers (2025-07-28T17:58:12Z)
Build the web for agents, not agents for the web [27.969222950526703]
We introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website.<n>We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization.<n>This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design.
arXiv Detail & Related papers (2025-06-12T17:53:58Z)
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [78.55946306325914]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z)
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents. We propose the Internet of Agents (IoA), a novel framework that addresses these limitations. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z)
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models [65.18602126334716]
Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots. We introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups.
arXiv Detail & Related papers (2024-01-25T03:33:18Z)
Signifiers as a First-class Abstraction in Hypermedia Multi-Agent Systems [0.6595290783361959]
We build on concepts and methods from Affordance Theory and Human-Computer Interaction to introduce signifiers as a first-class abstraction in Web-based Multi-Agent Systems. We define a formal model for the contextual exposure of signifiers in hypermedia environments that aims to drive affordance exploitation.
arXiv Detail & Related papers (2023-02-14T10:54:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.