AWorld: Orchestrating the Training Recipe for Agentic AI
- URL: http://arxiv.org/abs/2508.20404v2
- Date: Mon, 01 Sep 2025 03:56:31 GMT
- Title: AWorld: Orchestrating the Training Recipe for Agentic AI
- Authors: Chengyue Yu, Siyuan Lu, Chenyi Zhuang, Dong Wang, Qintong Wu, Zongyue Li, Runsheng Gan, Chunfeng Wang, Siqi Hou, Gaochi Huang, Wenlong Yan, Lifeng Hong, Aohui Xue, Yanfeng Wang, Jinjie Gu, David Tsai, Tao Lin,
- Abstract summary: We introduce AWorld, an open-source system engineered for large-scale agent-environment interaction.<n>By distributing tasks across a cluster, AWorld accelerates experience collection by 14.6x compared to standard single-node, sequential execution.<n>We trained a Qwen3-32B-based agent that achieves pass@1 accuracy of 32.23% on the GAIA test set.
- Score: 35.94278765364194
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The learning from practice paradigm is crucial for developing capable Agentic AI systems, yet it is severely hampered by inefficient experience generation, a bottleneck especially pronounced in complex benchmarks like GAIA. To address this, we introduce AWorld, an open-source system engineered for large-scale agent-environment interaction. By distributing tasks across a cluster, AWorld accelerates experience collection by 14.6x compared to standard single-node, sequential execution. This critical speedup makes extensive reinforcement learning practical and scalable. Leveraging this capability, we trained a Qwen3-32B-based agent that achieves pass@1 accuracy of 32.23% on the GAIA test set, which surpasses GPT-4o (27.91%) and rivals DeepSeek-V3 (31.89%). Our open-source system and the resulting agent provide a practical blueprint for a complete agentic AI training pipeline, from efficient interaction to demonstrable model improvement.
Related papers
- EnterpriseBench Corecraft: Training Generalizable Agents on High-Fidelity RL Environments [0.10934862523101825]
We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution.<n>We introduce CoreCraft, the first environment in EnterpriseBench, Surge AI's suite of agentic RL environments.
arXiv Detail & Related papers (2026-02-18T04:35:46Z) - SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent [63.15417992240217]
We introduce SkyRL-Agent, a framework for efficient, multi-turn, long-horizon agent training and evaluation.<n>It provides efficient asynchronous dispatching, lightweight tool integration, and flexible backend interoperability.<n>We train SA-SWE-32B, a software engineering agent trained from Qwen3-32B (24.4% Pass@1) purely with reinforcement learning.
arXiv Detail & Related papers (2025-11-20T07:05:19Z) - UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action [77.63125913907771]
We present UltraCUA, a foundation model that bridges the gap between GUI primitives and high-level programmatic tool calls.<n>Experiments with our 7B and 32B models demonstrate substantial improvements over state-of-the-art agents.
arXiv Detail & Related papers (2025-10-20T17:48:26Z) - Scaling Agents via Continual Pre-training [80.97989245493326]
We propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models.<n>We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability.
arXiv Detail & Related papers (2025-09-16T17:57:19Z) - SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience [71.82719117238307]
We propose SEAgent, an agentic self-evolving framework enabling computer-use agents to evolve through interactions with unfamiliar software.<n>We validate the effectiveness of SEAgent across five novel software environments within OS-World.<n>Our approach achieves a significant improvement of 23.2% in success rate, from 11.3% to 34.5%, over a competitive open-source CUA.
arXiv Detail & Related papers (2025-08-06T17:58:46Z) - Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning [31.540626068273014]
We train an agent based on Qwen2.5-72B-Instruct to solve real-world software engineering tasks.<n>Our approach increases the agent's success rate on the SWE-bench Verified benchmark from a 20% fine-tuned baseline to 39%.
arXiv Detail & Related papers (2025-08-05T14:30:47Z) - NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset [16.676904484703]
We introduce NaturalGAIA, a novel benchmark engineered on the principle of Causal Pathways.<n>This paradigm structures complex tasks into a series of verifiable atomic steps, ensuring rigorous, fully automated, and reproducible standard for assessment.<n>We then utilize this dataset to perform Reinforcement FineTuning (RFT) on the Q2.5-VL-7B model.
arXiv Detail & Related papers (2025-08-02T11:53:41Z) - Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping [54.65536245955678]
We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome the challenge of sample inefficiency.<n>We introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis.<n> Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL.
arXiv Detail & Related papers (2025-07-22T05:51:07Z) - Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users.<n>Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources.<n>We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z) - MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning [17.437573206368494]
Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks.<n>We present MENTOR, a method that improves both the architecture and optimization of RL agents.<n>MenTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks.
arXiv Detail & Related papers (2024-10-19T04:31:54Z) - Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents [44.34340798542]
Large Language Models (LLMs) have shown remarkable capabilities in natural language tasks requiring complex reasoning.
Traditional supervised pre-training on static datasets falls short in enabling autonomous agent capabilities.
We propose a framework that combines guided Monte Carlo Tree Search (MCTS) search with a self-critique mechanism and iterative fine-tuning on agent interactions.
arXiv Detail & Related papers (2024-08-13T20:52:13Z) - Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent
Self-Evolution [92.84441068115517]
Investigate-Consolidate-Exploit (ICE) is a novel strategy for enhancing the adaptability and flexibility of AI agents.
ICE promotes the transfer of knowledge between tasks for genuine self-evolution.
Our experiments on the XAgent framework demonstrate ICE's effectiveness, reducing API calls by as much as 80%.
arXiv Detail & Related papers (2024-01-25T07:47:49Z) - Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep
Reinforcement Learning with Demonstration-like Sampled Exploration [7.930709072852582]
We propose a generic framework for Learning from Demonstration (LfD) based on actor-critic algorithms.
We conduct experiments on 4 standard benchmark environments in Mujoco and 2 self-designed robotic environments.
arXiv Detail & Related papers (2021-09-27T12:42:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.