Related papers: Agent models: Internalizing Chain-of-Action Generation into Reasoning models

Agent models: Internalizing Chain-of-Action Generation into Reasoning models

URL: http://arxiv.org/abs/2503.06580v1
Date: Sun, 09 Mar 2025 12:19:47 GMT
Title: Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Authors: Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Xinyan Wen, Jitao Sang,
Abstract summary: We position emphLarge Agent Models (LAMs) that internalize the generation of emphChain-of-Action (CoA)<n>Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL)<n>Main components include step-level action triggering, trajectory-level CoA, and an internal world model to reduce real-environment interaction costs.
Score: 15.954047804223379
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position \emph{Large Agent Models (LAMs)} that internalize the generation of \emph{Chain-of-Action (CoA)}, enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA

Related papers

Digi-Q: Learning Q-Value Functions for Training Device-Control Agents [73.60512136881279]
Digi-Q trains VLM-based action-value Q-functions which are then used to extract the agent policy.<n> Digi-Q outperforms several prior methods on user-scale device control tasks in Android-in-the-Wild.
arXiv Detail & Related papers (2025-02-13T18:55:14Z)
CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic [11.682456863110767]
CoMAL is a framework designed to address the mixed-autonomy traffic problem by collaboration among autonomous vehicles to optimize traffic flow.<n>CoMAL is built upon large language models, operating in an interactive traffic simulation environment.
arXiv Detail & Related papers (2024-10-18T10:53:44Z)
MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents [7.4159044558995335]
We introduce MOSS (llM-oriented Operating System Simulation), a novel framework that integrates code generation with a dynamic context management system. At its core, the framework employs an Inversion of Control container in conjunction with decorators to enforce the least knowledge principle. We show how this framework can enhance the efficiency and capabilities of agent development and highlight its advantages in moving towards Turing-complete agents.
arXiv Detail & Related papers (2024-09-24T14:30:21Z)
xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks. xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z)
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA. It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z)
Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework. These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents. Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z)
You Only Look at Screens: Multimodal Chain-of-Action Agents [37.118034745972956]
Auto-GUI is a multimodal solution that directly interacts with the interface. We propose a chain-of-action technique to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30$K$ unique instructions.
arXiv Detail & Related papers (2023-09-20T16:12:32Z)
Chain-of-Skills: A Configurable Model for Open-domain Question Answering [79.8644260578301]
The retrieval model is an indispensable component for real-world knowledge-intensive tasks. Recent work focuses on customized methods, limiting the model transferability and scalability. We propose a modular retriever where individual modules correspond to key skills that can be reused across datasets.
arXiv Detail & Related papers (2023-05-04T20:19:39Z)
Self-Consistent Models and Values [42.53364554418915]
Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly emphself-consistent Our approach differs from classic planning methods such as Dyna, which only update values to be consistent with the model.
arXiv Detail & Related papers (2021-10-25T12:09:42Z)
AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction [75.16836697734995]
We propose a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS) AutoFIS can automatically identify important feature interactions for factorization models with computational cost just equivalent to training the target model to convergence. AutoFIS has been deployed onto the training platform of Huawei App Store recommendation service.
arXiv Detail & Related papers (2020-03-25T06:53:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.