EmbeWebAgent: Embedding Web Agents into Any Customized UI
- URL: http://arxiv.org/abs/2602.14865v1
- Date: Mon, 16 Feb 2026 15:59:56 GMT
- Title: EmbeWebAgent: Embedding Web Agents into Any Customized UI
- Authors: Chenyang Ma, Clyde Fare, Matthew Wilson, Dave Braines,
- Abstract summary: We present EmbeWebAgent, a framework for embedding agents directly into existing UIs.<n>It supports mixed-granularity actions ranging from primitives to higher-level composites.<n>Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting.
- Score: 3.034887612600091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most web agents operate at the human interface level, observing screenshots or raw DOM trees without application-level access, which limits robustness and action expressiveness. In enterprise settings, however, explicit control of both the frontend and backend is available. We present EmbeWebAgent, a framework for embedding agents directly into existing UIs using lightweight frontend hooks (curated ARIA and URL-based observations, and a per-page function registry exposed via a WebSocket) and a reusable backend workflow that performs reasoning and takes actions. EmbeWebAgent is stack-agnostic (e.g., React or Angular), supports mixed-granularity actions ranging from GUI primitives to higher-level composites, and orchestrates navigation, manipulation, and domain-specific analytics via MCP tools. Our demo shows minimal retrofitting effort and robust multi-step behaviors grounded in a live UI setting. Live Demo: https://youtu.be/Cy06Ljee1JQ
Related papers
- Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web [17.537750923987762]
Current web agents operate on low-level primitives such as clicks and keystrokes.<n>We argue that the agentic web also requires a semantic layer for web actions.<n>We propose textbfWeb Verbs, a web-scale set of typed, semantically documented functions.
arXiv Detail & Related papers (2026-02-19T10:50:52Z) - Nested Browser-Use Learning for Agentic Information Seeking [60.775556172513014]
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching.<n>We propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure.
arXiv Detail & Related papers (2025-12-29T17:59:14Z) - Permission Manifests for Web Agents [30.22217505383227]
The rise of Large Language Model (LLM)-based web agents represents a significant shift in automated interactions with the web.<n>Without a way to specify what interactions are and are not allowed, website owners increasingly rely on blanket blocking and CAPTCHAs.<n>We introduce agent-permissions, a robots.txt-style interfaces manifest where websites specify allowed interactions, complemented by API references.
arXiv Detail & Related papers (2025-12-07T17:45:01Z) - WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation [30.193562985137813]
We propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation.<n>The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity.
arXiv Detail & Related papers (2025-11-09T06:58:52Z) - BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions [48.194688161526756]
BrowserAgent operates directly on raw web pages via Playwright through a set of predefined browser actions.<n>We introduce an explicit memory mechanism to store key conclusions across steps, further enhancing the model's reasoning capabilities.
arXiv Detail & Related papers (2025-10-12T15:43:37Z) - WALT: Web Agents that Learn Tools [66.73502484310121]
WALT is a framework that reverse-engineers latent website functionality into reusable invocable tools.<n>Rather than hypothesizing ad-hoc skills, WALT exposes robust implementations of automations already designed into websites.<n>On VisualWebArena and WebArena, WALT achieves higher success with fewer steps and less LLM-dependent reasoning.
arXiv Detail & Related papers (2025-10-01T23:41:47Z) - UFO2: The Desktop AgentOS [60.317812905300336]
UFO2 is a multiagent AgentOS for Windows desktops that elevates into practical, system-level automation.<n>We evaluate UFO2 across over 20 real-world Windows applications, demonstrating substantial improvements in robustness and execution accuracy over prior CUAs.<n>Our results show that deep OS integration unlocks a scalable path toward reliable, user-aligned desktop automation.
arXiv Detail & Related papers (2025-04-20T13:04:43Z) - Infogent: An Agent-Based Framework for Web Information Aggregation [59.67710556177564]
We introduce Infogent, a novel framework for web information aggregation.
Experiments on different information access settings demonstrate Infogent beats an existing SOTA multi-agent search framework by 7%.
arXiv Detail & Related papers (2024-10-24T18:01:28Z) - Beyond Browsing: API-Based Web Agents [58.39129004543844]
API-Based Agents outperform web Browsing Agents in experiments on WebArena.<n>Hybrid Agents out-perform both others nearly uniformly across tasks.<n>Results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.
arXiv Detail & Related papers (2024-10-21T19:46:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.