Related papers: PAFFA: Premeditated Actions For Fast Agents

PAFFA: Premeditated Actions For Fast Agents

URL: http://arxiv.org/abs/2412.07958v1
Date: Tue, 10 Dec 2024 22:51:31 GMT
Title: PAFFA: Premeditated Actions For Fast Agents
Authors: Shambhavi Krishna, Zheng Chen, Vaibhav Kumar, Xiaojiang Huang, Yingjie Li, Fan Yang, Xiang Li,
Abstract summary: PAFFA is a framework designed to enhance web interaction capabilities through an Action API Library of reusable, verified browser interaction functions.<n>It reduces inference calls by 87% while maintaining robust performance even as website structures evolve.<n>This framework accelerates multi-page task execution and offers a scalable solution to advance autonomous web agent research.
Score: 23.363582411971567
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Modern AI assistants have made significant progress in natural language understanding and API/tool integration, with emerging efforts to incorporate diverse interfaces (such as Web interfaces) for enhanced scalability and functionality. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. To overcome these challenges, we introduce PAFFA (Premeditated Actions For Fast Agents), a framework designed to enhance web interaction capabilities through an Action API Library of reusable, verified browser interaction functions. By pre-computing interaction patterns and employing two core methodologies - "Dist-Map" for task-agnostic element distillation and "Unravel" for incremental page-wise exploration - PAFFA reduces inference calls by 87% while maintaining robust performance even as website structures evolve. This framework accelerates multi-page task execution and offers a scalable solution to advance autonomous web agent research.

Related papers

Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
AdaptBot: Combining LLM with Knowledge Graphs and Human Input for Generic-to-Specific Task Decomposition and Knowledge Refinement [11.704158944329741]
Large Language Models (LLMs) trained on considerable knowledge can be used to predict a sequence of abstract actions for completing such tasks. Our framework addresses these challenges by leveraging the generic predictions provided by LLM and the prior domain knowledge encoded in a Knowledge Graph. The robot also solicits and uses human input as needed to refine its existing knowledge.
arXiv Detail & Related papers (2025-02-04T07:32:39Z)
Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments. We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z)
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE) LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients. It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z)
Beyond Browsing: API-Based Web Agents [58.39129004543844]
API-based agents outperform web browsing agents in experiments on WebArena. Hybrid Agents out-perform both others nearly uniformly across tasks. Results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.
arXiv Detail & Related papers (2024-10-21T19:46:06Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Steward: Natural Language Web Automation [19.301371856154965]
Large language models (LLMs) have demonstrated exceptional capabilities in serving as the foundation for AI assistants. We introduce Steward, a novel LLM-powered web automation tool designed to serve as a cost-effective, scalable, end-to-end solution for automating web interactions. We discuss various design and implementation challenges, including state representation, action sequence selection, system responsiveness, detecting task completion, and caching implementation.
arXiv Detail & Related papers (2024-09-23T18:06:32Z)
Agent Workflow Memory [71.81385627556398]
We introduce Agent Memory, a method for inducing commonly reused routines. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate. Online AWM robustly generalizes in cross-task, website, and domain evaluations.
arXiv Detail & Related papers (2024-09-11T17:21:00Z)
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents. We propose the Internet of Agents (IoA), a novel framework that addresses these limitations. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z)
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts. Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website. We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z)
On the Multi-turn Instruction Following for Conversational Web Agents [83.51251174629084]
We introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment. We propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques.
arXiv Detail & Related papers (2024-02-23T02:18:12Z)
AllTogether: Investigating the Efficacy of Spliced Prompt for Web Navigation using Large Language Models [2.234037966956278]
We introduce AllTogether, a standardized prompt template that enhances task context representation. We evaluate the efficacy of this approach through prompt learning and instruction finetuning based on open-source Llama-2 and API-accessible GPT models.
arXiv Detail & Related papers (2023-10-20T11:10:14Z)
You Only Look at Screens: Multimodal Chain-of-Action Agents [37.118034745972956]
Auto-GUI is a multimodal solution that directly interacts with the interface. We propose a chain-of-action technique to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30$K$ unique instructions.
arXiv Detail & Related papers (2023-09-20T16:12:32Z)
TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks. For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner. For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.