webMCP: Efficient AI-Native Client-Side Interaction for Agent-Ready Web Design
- URL: http://arxiv.org/abs/2508.09171v1
- Date: Wed, 06 Aug 2025 23:02:36 GMT
- Title: webMCP: Efficient AI-Native Client-Side Interaction for Agent-Ready Web Design
- Authors: D. Perera,
- Abstract summary: Current AI agents create significant barriers for users by requiring extensive processing to understand web pages.<n>This paper introduces webMCP, a client-side standard that embeds structured interaction metadata directly into web pages.<n>webMCP reduces processing requirements by 67.6% while maintaining 97.9% task success rates.<n>Users experience significantly lower costs (34-63% reduction) and faster response times across diverse web interactions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current AI agents create significant barriers for users by requiring extensive processing to understand web pages, making AI-assisted web interaction slow and expensive. This paper introduces webMCP (Web Machine Context & Procedure), a client-side standard that embeds structured interaction metadata directly into web pages, enabling more efficient human-AI collaboration on existing websites. webMCP transforms how AI agents understand web interfaces by providing explicit mappings between page elements and user actions. Instead of processing entire HTML documents, agents can access pre-structured interaction data, dramatically reducing computational overhead while maintaining task accuracy. A comprehensive evaluation across 1,890 real API calls spanning online shopping, authentication, and content management scenarios demonstrates webMCP reduces processing requirements by 67.6% while maintaining 97.9% task success rates compared to 98.8% for traditional approaches. Users experience significantly lower costs (34-63% reduction) and faster response times across diverse web interactions. Statistical analysis confirms these improvements are highly significant across multiple AI models. An independent WordPress deployment study validates practical applicability, showing consistent improvements across real-world content management workflows. webMCP requires no server-side modifications, making it deployable across millions of existing websites without technical barriers. These results establish webMCP as a viable solution for making AI web assistance more accessible and sustainable, addressing the critical gap between user interaction needs and AI computational requirements in production environments.
Related papers
- Modeling Distinct Human Interaction in Web Agents [59.600507469754575]
We introduce the task of modeling human intervention to support collaborative web task execution.<n>We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover.<n>We deploy these intervention-aware models in live web navigation agents and evaluate them in a user study, finding a 26.5% increase in user-rated agent usefulness.
arXiv Detail & Related papers (2026-02-19T18:11:28Z) - Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts [59.68272935616536]
Avenir-Web is a web agent that achieves a new open-source state of the art on the Online-Mind2Web benchmark in real-world deployment.<n>We evaluate Avenir-Web on Online-Mind2Web, a rigorous benchmark of live and user-centered web tasks.
arXiv Detail & Related papers (2026-02-02T18:50:07Z) - Permission Manifests for Web Agents [30.22217505383227]
The rise of Large Language Model (LLM)-based web agents represents a significant shift in automated interactions with the web.<n>Without a way to specify what interactions are and are not allowed, website owners increasingly rely on blanket blocking and CAPTCHAs.<n>We introduce agent-permissions, a robots.txt-style interfaces manifest where websites specify allowed interactions, complemented by API references.
arXiv Detail & Related papers (2025-12-07T17:45:01Z) - Building the Web for Agents: A Declarative Framework for Agent-Web Interaction [0.7116403133334644]
We introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents.<n> VOIX introduces tool> and context> tags, allowing developers to explicitly define available actions and relevant state.<n>We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers.
arXiv Detail & Related papers (2025-11-14T13:23:34Z) - Affordance Representation and Recognition for Autonomous Agents [64.39018305018904]
This paper introduces a pattern language for world modeling from structured data.<n>The DOM Transduction Pattern addresses the challenge of web page complexity.<n>The Hypermedia Affordances Recognition Pattern enables the agent to dynamically enrich its world model.
arXiv Detail & Related papers (2025-10-28T14:27:28Z) - Build the web for agents, not agents for the web [27.969222950526703]
We introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website.<n>We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization.<n>This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design.
arXiv Detail & Related papers (2025-06-12T17:53:58Z) - WebGames: Challenging General-Purpose Web-Browsing AI Agents [11.320069795732058]
WebGames is a comprehensive benchmark suite designed to evaluate general-purpose web-browsing AI agents.<n>We evaluate leading vision-language models including GPT-4o, Claude Computer-Use, Gemini-1.5-Pro, and Qwen2-VL against human performance.<n>Results reveal a substantial capability gap, with the best AI system achieving only 43.1% success rate compared to human performance of 95.7%.
arXiv Detail & Related papers (2025-02-25T16:45:08Z) - R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory [53.94879482534949]
Current models often struggle with efficient navigation and action execution due to limited visibility and understanding of web structures.<n>Our proposed R2D2 framework addresses these challenges by integrating two paradigms: Remember and Reflect.<n>Our findings suggest that a combination of memory-enhanced navigation and reflective learning promisingly advances the capabilities of web agents.
arXiv Detail & Related papers (2025-01-21T20:21:58Z) - TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We introduce TheAgentCompany, a benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker.<n>We find that the most competitive agent can complete 30% of tasks autonomously.<n>This paints a nuanced picture on task automation with simulating LM agents in a setting a real workplace.
arXiv Detail & Related papers (2024-12-18T18:55:40Z) - PAFFA: Premeditated Actions For Fast Agents [19.576180667174366]
We introduce PAFFA, a method that makes LLMs faster and more accurate in completing tasks on the internet using a novel inference-time technique.<n>PAFFA drastically reduces inference time tokens by 87% while maintaining robust performance.<n>Unravel's ability to update its action library based on explorations allows generalization and adaptation to unseen websites.
arXiv Detail & Related papers (2024-12-10T22:51:31Z) - Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping [57.024913536420264]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance on the design-to-code task.<n>We present the first systematic investigation of MLLMs in generating interactive webpages.
arXiv Detail & Related papers (2024-11-05T17:40:03Z) - WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models [65.18602126334716]
Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots.
We introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites.
We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups.
arXiv Detail & Related papers (2024-01-25T03:33:18Z) - MobileAgent: enhancing mobile control via human-machine interaction and
SOP integration [0.0]
Large Language Models (LLMs) are now capable of automating mobile device operations for users.
Privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation.
We have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs.
Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks.
arXiv Detail & Related papers (2024-01-04T03:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.