Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts
- URL: http://arxiv.org/abs/2602.02468v1
- Date: Mon, 02 Feb 2026 18:50:07 GMT
- Title: Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts
- Authors: Aiden Yiliu Li, Xinyue Hao, Shilong Liu, Mengdi Wang,
- Abstract summary: Avenir-Web is a web agent that achieves a new open-source state of the art on the Online-Mind2Web benchmark in real-world deployment.<n>We evaluate Avenir-Web on Online-Mind2Web, a rigorous benchmark of live and user-centered web tasks.
- Score: 59.68272935616536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite advances in multimodal large language models, autonomous web agents still struggle to reliably execute long-horizon tasks on complex and dynamic web interfaces. Existing agents often suffer from inaccurate element grounding, the absence of site-specific procedural knowledge, and unstable long-term task tracking and memory, particularly when operating over complex Document Object Model structures. To address these limitations, we introduce Avenir-Web, a web agent that achieves a new open-source state of the art on the Online-Mind2Web benchmark in real-world deployment. Avenir-Web leverages a Mixture of Grounding Experts, Experience-Imitation Planning for incorporating procedural priors, and a task-tracking checklist combined with adaptive memory to enable robust and seamless interaction across diverse user interface paradigms. We evaluate Avenir-Web on Online-Mind2Web, a rigorous benchmark of live and user-centered web tasks. Our results demonstrate that Avenir-Web significantly surpasses prior open-source agents and attains performance parity with top-tier proprietary models, thereby establishing a new open-source state of the art for reliable web agents on live websites.
Related papers
- Building the Web for Agents: A Declarative Framework for Agent-Web Interaction [0.7116403133334644]
We introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents.<n> VOIX introduces tool> and context> tags, allowing developers to explicitly define available actions and relevant state.<n>We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers.
arXiv Detail & Related papers (2025-11-14T13:23:34Z) - Agentic Web: Weaving the Next Web with AI Agents [109.13815627467514]
The emergence of AI agents powered by large language models (LLMs) marks a pivotal shift toward the Agentic Web.<n>In this paradigm, agents interact directly with one another to plan, coordinate, and execute complex tasks on behalf of users.<n>We present a structured framework for understanding and building the Agentic Web.
arXiv Detail & Related papers (2025-07-28T17:58:12Z) - Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence [109.32705135051486]
Embodied Web Agents is a novel paradigm for AI agents that fluidly bridge the embodiment and web-scale reasoning.<n>We release the Embodied Web Agents Benchmark, which encompasses a diverse suite of tasks.<n>Results reveal significant performance gaps between state-of-the-art AI systems and human capabilities.
arXiv Detail & Related papers (2025-06-18T17:58:17Z) - Build the web for agents, not agents for the web [27.969222950526703]
We introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website.<n>We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization.<n>This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design.
arXiv Detail & Related papers (2025-06-12T17:53:58Z) - Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents.
We propose the Internet of Agents (IoA), a novel framework that addresses these limitations.
IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z) - On the Multi-turn Instruction Following for Conversational Web Agents [83.51251174629084]
We introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment.
We propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques.
arXiv Detail & Related papers (2024-02-23T02:18:12Z) - WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models [65.18602126334716]
Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots.
We introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites.
We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups.
arXiv Detail & Related papers (2024-01-25T03:33:18Z) - Signifiers as a First-class Abstraction in Hypermedia Multi-Agent
Systems [0.6595290783361959]
We build on concepts and methods from Affordance Theory and Human-Computer Interaction to introduce signifiers as a first-class abstraction in Web-based Multi-Agent Systems.
We define a formal model for the contextual exposure of signifiers in hypermedia environments that aims to drive affordance exploitation.
arXiv Detail & Related papers (2023-02-14T10:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.