Related papers: Beyond Browsing: API-Based Web Agents

Beyond Browsing: API-Based Web Agents

URL: http://arxiv.org/abs/2410.16464v1
Date: Mon, 21 Oct 2024 19:46:06 GMT
Title: Beyond Browsing: API-Based Web Agents
Authors: Yueqi Song, Frank Xu, Shuyan Zhou, Graham Neubig,
Abstract summary: API-based agents outperform web browsing agents in experiments on WebArena. Hybrid Agents out-perform both others nearly uniformly across tasks. Results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.
Score: 58.39129004543844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Web browsers are a portal to the internet, where much of human activity is undertaken. Thus, there has been significant research work in AI agents that interact with the internet through web browsing. However, there is also another interface designed specifically for machine interaction with online content: application programming interfaces (APIs). In this paper we ask -- what if we were to take tasks traditionally tackled by browsing agents, and give AI agents access to APIs? To do so, we propose two varieties of agents: (1) an API-calling agent that attempts to perform online tasks through APIs only, similar to traditional coding agents, and (2) a Hybrid Agent that can interact with online data through both web browsing and APIs. In experiments on WebArena, a widely-used and realistic benchmark for web navigation tasks, we find that API-based agents outperform web browsing agents. Hybrid Agents out-perform both others nearly uniformly across tasks, resulting in a more than 20.0% absolute improvement over web browsing alone, achieving a success rate of 35.8%, achiving the SOTA performance among task-agnostic agents. These results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.

Related papers

WebLists: Extracting Structured Information From Complex Interactive Websites Using Executable LLM Agents [1.6673034682613495]
We present WebLists, a benchmark of 200 data-extraction tasks across four common business and enterprise use-cases. We show that both LLMs with search capabilities and SOTA web agents struggle with these tasks, with a recall of 3% and 31%, respectively. We propose BardeenAgent, a novel framework that enables web agents to convert their execution into repeatable programs, and replay them at scale across pages with similar structure. On the WebLists benchmark, BardeenAgent achieves 66% recall overall, more than doubling the performance of SOTA web agents, and reducing cost per output row by 3x.
arXiv Detail & Related papers (2025-04-17T06:16:40Z)
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation [70.3224918173672]
CowPilot is a framework supporting autonomous as well as human-agent collaborative web navigation. It reduces the number of steps humans need to perform by allowing agents to propose next steps, while users are able to pause, reject, or take alternative actions. CowPilot can serve as a useful tool for data collection and agent evaluation across websites.
arXiv Detail & Related papers (2025-01-28T00:56:53Z)
Infogent: An Agent-Based Framework for Web Information Aggregation [59.67710556177564]
We introduce Infogent, a novel framework for web information aggregation. Experiments on different information access settings demonstrate Infogent beats an existing SOTA multi-agent search framework by 7%.
arXiv Detail & Related papers (2024-10-24T18:01:28Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents [40.86728610906313]
AXIS is a novel LLM-based agents framework that prioritizes actions through application programming interfaces (APIs) over user interface actions. Our experiments on Office Word demonstrate that AXIS reduces task completion time by 65%-70% and cognitive workload by 38%-53%, while maintaining accuracy of 97%-98% compare to humans. It also explores the possibility of turning every applications into agents, paving the way towards an agent-centric operating system (Agent OS)
arXiv Detail & Related papers (2024-09-25T17:58:08Z)
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents. We propose the Internet of Agents (IoA), a novel framework that addresses these limitations. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z)
Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space. It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z)
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
We propose an agent that perceives its environment solely through screenshot images. By leveraging the reasoning capability of the Large Language Models, we eliminate the need for large-scale human demonstration data. Agent achieves an average success rate of 94.5% on MiniWoB++ and an average task score of 62.3 on WebShop.
arXiv Detail & Related papers (2024-06-11T05:21:20Z)
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z)
WebSuite: Systematically Evaluating Why Web Agents Fail [2.200477647229223]
We describe WebSuite, the first diagnostic benchmark for generalist web agents. This benchmark suite consists of both individual tasks, such as clicking a button, and end-to-end tasks, such as adding an item to a cart. We evaluate two popular generalist web agents, one text-based and one multimodal, and identify unique weaknesses for each agent.
arXiv Detail & Related papers (2024-06-01T00:32:26Z)
MMInA: Benchmarking Multihop Multimodal Internet Agents [36.173995299002]
We present MMInA, a multihop and multimodal benchmark to evaluate the embodied agents for compositional Internet tasks. Our data includes 1,050 human-written tasks covering various domains such as shopping and travel. Our method significantly improved both the single-hop and multihop web browsing abilities of agents.
arXiv Detail & Related papers (2024-04-15T17:59:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.