Related papers: Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

URL: http://arxiv.org/abs/2602.10090v2
Date: Wed, 11 Feb 2026 18:20:25 GMT
Title: Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
Authors: Zhaoyang Wang, Canwen Xu, Boyi Liu, Yite Wang, Siwei Han, Zhewei Yao, Huaxiu Yao, Yuxiong He,
Abstract summary: Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
Score: 62.499592503950026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets (35 tools per environment on average) and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of this resource, we perform large-scale reinforcement learning for multi-turn tool-use agents. Thanks to the fully executable environments and accessible database states, we can also design reliable reward functions. Experiments on three benchmarks show that training exclusively in synthetic environments, rather than benchmark-specific ones, yields strong out-of-distribution generalization. The code is available at https://github.com/Snowflake-Labs/agent-world-model.

Related papers

ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training [34.682505898865884]
We introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch.<n>By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks.
arXiv Detail & Related papers (2026-02-06T16:05:55Z)
SWE-Universe: Scale Real-World Verifiable Environments to Millions [84.63665266236963]
SWE-Universe is a framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs)<n>We propose a building agent powered by an efficient custom-trained model to overcome the prevalent challenges of automatic building.<n>We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning.
arXiv Detail & Related papers (2026-02-02T17:20:30Z)
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas [13.919124676472022]
ASTRA is an end-to-end framework for training tool-augmented language model agents.<n>ASTRA integrates scalable data synthesis and verifiable reinforcement learning.<n> Experiments on multiple agentic tool-use benchmarks demonstrate that ASTRA-trained models achieve state-of-the-art performance.
arXiv Detail & Related papers (2026-01-29T11:22:23Z)
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis [101.67583081810136]
Large language models (LLMs) are expected to be trained to act as agents in various real-world environments.<n>This process relies on rich and varied tool-interaction sandboxes.<n>We propose EnvScaler, an automated framework for scalable tool-interaction environments.
arXiv Detail & Related papers (2026-01-09T14:32:06Z)
CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL [35.086788669916594]
Large language model based agents are increasingly deployed in complex, tool augmented environments.<n>Existing approaches typically assume predefined task collections, an assumption that fails in novel environments.<n>We propose CuES, a Curiosity driven and Environment grounded Synthesis framework that autonomously generates diverse, executable, and meaningful tasks.
arXiv Detail & Related papers (2025-12-01T06:11:37Z)
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications [20.065087936770215]
We introduce VitaBench, a benchmark that evaluates agents on versatile interactive tasks grounded in real-world settings.<n>VitaBench presents agents with the most complex life-serving simulation environment to date, comprising 66 tools.<n>Our comprehensive evaluation reveals that even the most advanced models achieve only 30% success rate on cross-scenario tasks.
arXiv Detail & Related papers (2025-09-30T16:33:49Z)
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym [52.31172214690965]
We introduce CodeGym, a framework that synthesizes diverse, verifiable, and controllable multi-turn tool-use environments for agent RL.<n>CodeGym rewrites static coding problems into interactive environments by extracting atomic functions or logic into callable tools.<n>Models of varying sizes and chain-of-thought configurations, trained in CodeGym, exhibit consistent out-of-distribution generalizability.
arXiv Detail & Related papers (2025-09-22T03:03:56Z)
Towards General Agentic Intelligence via Environment Scaling [78.66355092082253]
Advanced agentic intelligence is a prerequisite for deploying Large Language Models in real-world applications.<n>We design a scalable framework that automatically constructs heterogeneous environments that are fully simulated.<n>Experiments on agentic benchmarks, tau-bench, tau2-Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the function-calling capability of models.
arXiv Detail & Related papers (2025-09-16T17:57:20Z)
Very Large-Scale Multi-Agent Simulation in AgentScope [112.98986800070581]
We develop new features and components for AgentScope, a user-friendly multi-agent platform. We propose an actor-based distributed mechanism towards great scalability and high efficiency. We also provide a web-based interface for conveniently monitoring and managing a large number of agents.
arXiv Detail & Related papers (2024-07-25T05:50:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.