Related papers: ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

URL: http://arxiv.org/abs/2503.22673v3
Date: Thu, 17 Jul 2025 01:19:22 GMT
Title: ActionStudio: A Lightweight Framework for Data and Training of Large Action Models
Authors: Jianguo Zhang, Thai Hoang, Ming Zhu, Zuxin Liu, Shiyu Wang, Tulika Awalgaonkar, Akshara Prabhakar, Haolin Chen, Weiran Yao, Zhiwei Liu, Juntao Tan, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong,
Abstract summary: ActionStudio is a lightweight and scalable data and training framework designed for large action models.<n>Our trained models yield top performances across public and realistic agent benchmarks.<n>We open-source the ActionStudio framework and release actionstudio-98k, a curated dataset of 98k high-quality trajectories.
Score: 88.90834854360641
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large Action models are essential for enabling autonomous agents to perform complex tasks. However, training such models remains challenging due to the diversity of agent environments and the complexity of noisy agentic data. Existing infrastructure offers limited support for scalable, agent-specific fine-tuning and standardized agent data processing. We introduce ActionStudio, a lightweight and extensible data and training framework designed for large action models. ActionStudio unifies diverse agent trajectories using our proposed Unified Format 2.0, supports a range of training workflows with optimized multi-node distributed setup, and integrates robust preprocessing and real-time verification tools. ActionStudio demonstrates up to 9x higher throughput compared to existing agentic training frameworks, and our trained models yield top performances across public and realistic agent benchmarks. To support the broader research community, we open-source the ActionStudio framework and release actionstudio-98k, a curated dataset of 98k high-quality trajectories. Code: https://github.com/SalesforceAIResearch/xLAM.

Related papers

LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback [121.78866929908871]
Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data.<n>We present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback.<n>Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback.
arXiv Detail & Related papers (2025-06-02T22:36:02Z)
Command A: An Enterprise-Ready Large Language Model [180.18356391290172]
Command A is an agent-optimised and multilingual-capable model.<n>It offers best-in-class Retrieval Augmented Generation capabilities.
arXiv Detail & Related papers (2025-04-01T12:08:07Z)
Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning [0.41783829807634765]
Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games.<n>This paper advocates for direct interpretability, generating post hoc explanations directly from trained models.<n>We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery.
arXiv Detail & Related papers (2025-02-02T09:15:27Z)
TrajAgent: An LLM-based Agent Framework for Automated Trajectory Modeling via Collaboration of Large and Small Models [10.86175727790196]
Trajectory modeling has widespread applications in areas such as life services, urban transportation, and public administration.<n>We propose textitTrajAgent, a framework to facilitate robust and efficient trajectory modeling through automation modeling.
arXiv Detail & Related papers (2024-10-27T13:51:09Z)
Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
AgentSquare: Automatic LLM Agent Search in Modular Design Space [16.659969168343082]
Large Language Models (LLMs) have led to a rapid growth of agentic systems capable of handling a wide range of complex tasks.<n>We introduce a new research problem: Modularized LLM Agent Search (MoLAS)
arXiv Detail & Related papers (2024-10-08T15:52:42Z)
xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks. xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z)
Very Large-Scale Multi-Agent Simulation in AgentScope [112.98986800070581]
We develop new features and components for AgentScope, a user-friendly multi-agent platform. We propose an actor-based distributed mechanism towards great scalability and high efficiency. We also provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z)
Plain-Det: A Plain Multi-Dataset Object Detector [22.848784430833835]
Plain-Det offers flexibility to accommodate new datasets, in performance across diverse datasets, and training efficiency. We conduct extensive experiments on 13 downstream datasets and Plain-Det demonstrates strong generalization capability.
arXiv Detail & Related papers (2024-07-14T05:18:06Z)
CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception [0.552480439325792]
We propose an agent-based training framework that handles the deep learning modules and agent data separately to have a cleaner data flow structure. This framework not only provides an API for prototyping the data processing pipeline and defining the gradient calculation for each agent, but also provides the user interface for interactive training, testing and data visualization.
arXiv Detail & Related papers (2024-04-29T11:40:27Z)
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z)
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models [74.64651681052628]
We introduce ModelScope-Agent, a customizable agent framework for real-world applications based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs. A comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation.
arXiv Detail & Related papers (2023-09-02T16:50:30Z)
High Performance Simulation for Scalable Multi-Agent Reinforcement Learning [1.675857332621569]
We demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents. We show that these environments can train shared RL policies on time-scales of minutes and hours.
arXiv Detail & Related papers (2022-07-08T14:54:06Z)
SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis [18.084628500554462]
We introduce SINGA-Easy, a new deep learning framework that provides distributed hyper- parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation. Our experiments on the training and deployment of multi-modality data analysis applications show that the framework is both usable and adaptable to dynamic inference loads.
arXiv Detail & Related papers (2021-08-03T08:39:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.