Related papers: DynaSaur: Large Language Agents Beyond Predefined Actions

DynaSaur: Large Language Agents Beyond Predefined Actions

URL: http://arxiv.org/abs/2411.01747v1
Date: Mon, 04 Nov 2024 02:08:59 GMT
Title: DynaSaur: Large Language Agents Beyond Predefined Actions
Authors: Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, Ryan A. Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, Tianyi Zhou,
Abstract summary: Existing LLM agent systems typically select actions from a fixed and predefined set at every step. We propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. Our experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods.
Score: 108.75187263724838
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard. Our code can be found in \href{https://github.com/adobe-research/dynasaur}{https://github.com/adobe-research/dynasaur}.

Related papers

SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement [81.30121762971473]
SynWorld is a framework that allows agents to autonomously explore environments, optimize, and enhance their understanding of actions. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments.
arXiv Detail & Related papers (2025-04-04T16:10:57Z)
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems [10.67359331022116]
textitTalk Structurally, Act Hierarchically (TalkHier) is a novel framework that introduces a structured communication protocol for context-rich exchanges. textitTalkHier surpasses various types of SoTA, including inference scaling model (OpenAI-o1), open-source multi-agent models (e.g., AgentVerse)
arXiv Detail & Related papers (2025-02-16T12:26:58Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Language Models can Infer Action Semantics for Symbolic Planners from Environment Feedback [26.03718733867297]
We propose Predicting Semantics of Actions with Language Models (PSALM) PSALM learns action semantics by leveraging the strengths of both symbolic planners and Large Language Models (LLMs) Experiments show PSALM boosts plan success rate from 36.4% (on Claude-3.5) to 100%, and explores the environment more efficiently than prior work to infer ground truth domain action semantics.
arXiv Detail & Related papers (2024-06-04T21:29:56Z)
Executable Code Actions Elicit Better LLM Agents [76.95566120678787]
This work proposes to use Python code to consolidate Large Language Model (LLM) agents' actions into a unified action space (CodeAct) integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language.
arXiv Detail & Related papers (2024-02-01T21:38:58Z)
Formally Specifying the High-Level Behavior of LLM-Based Agents [24.645319505305316]
LLMs have emerged as promising tools for solving challenging problems without the need for task-specific finetuned models. Currently, the design and implementation of such agents is ad hoc, as the wide variety of tasks that LLM-based agents may be applied to naturally means there can be no one-size-fits-all approach to agent design. We propose a minimalistic generation framework that simplifies the process of building agents.
arXiv Detail & Related papers (2023-10-12T17:24:15Z)
LASER: LLM Agent with State-Space Exploration for Web Navigation [57.802977310392755]
Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. Previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples. We propose to model the interactive task as state space exploration, where the LLM agent transitions among a pre-defined set of states by performing actions to complete the task.
arXiv Detail & Related papers (2023-09-15T05:44:08Z)
AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z)
Generating Executable Action Plans with Environmentally-Aware Language Models [4.162663632560141]
Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents. We propose an approach to generate environmentally-aware action plans that agents are better able to execute.
arXiv Detail & Related papers (2022-10-10T18:56:57Z)
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [111.33545170562337]
We investigate the possibility of grounding high-level tasks, expressed in natural language, to a chosen set of actionable steps. We find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions.
arXiv Detail & Related papers (2022-01-18T18:59:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.