Agent Workflow Memory
- URL: http://arxiv.org/abs/2409.07429v1
- Date: Wed, 11 Sep 2024 17:21:00 GMT
- Title: Agent Workflow Memory
- Authors: Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig,
- Abstract summary: We introduce Agent Memory, a method for inducing commonly reused routines.
AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate.
Online AWM robustly generalizes in cross-task, website, and domain evaluations.
- Score: 71.81385627556398
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.
Related papers
- Flow: A Modular Approach to Automated Agentic Workflow Generation [53.073598156915615]
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution.
However, the effective adjustment of Agentic during execution has not been well-studied.
arXiv Detail & Related papers (2025-01-14T04:35:37Z) - AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space.
AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z) - Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - Couler: Unified Machine Learning Workflow Optimization in Cloud [6.769259207650922]
Couler is a system designed for unified ML workflow optimization in the cloud.
We integrate Large Language Models (LLMs) into workflow generation, and provide a unified programming interface for various workflow engines.
Couer has successfully improved the CPU/Memory utilization by more than 15% and the workflow completion rate by around 17%.
arXiv Detail & Related papers (2024-03-12T12:47:32Z) - WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible.
We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains.
We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.