ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training
- URL: http://arxiv.org/abs/2602.06820v1
- Date: Fri, 06 Feb 2026 16:05:55 GMT
- Title: ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training
- Authors: Dunwei Tu, Hongyan Hao, Hansi Yang, Yihao Chen, Yi-Kai Zhang, Zhikang Xia, Yu Yang, Yueqing Sun, Xingchen Liu, Furao Shen, Qi Gu, Hui Su, Xunliang Cai,
- Abstract summary: We introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch.<n>By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks.
- Score: 34.682505898865884
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and existing synthesis methods suffer from significant limitations regarding environmental diversity and scalability. To address these challenges, we introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. Specifically, ScaleEnv ensures environment reliability through procedural testing, and guarantees task completeness and solvability via tool dependency graph expansion and executable action verification. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as $τ^2$-Bench and VitaBench, highlighting strong generalization capabilities. Furthermore, we investigate the relationship between increasing number of domains and model generalization performance, providing empirical evidence that scaling environmental diversity is critical for robust agent learning.
Related papers
- AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition [72.24180896265192]
We introduce AgentNoiseBench, a framework for evaluating robustness of agentic models under noisy environments.<n>We first conduct an in-depth analysis of biases and uncertainties in real-world scenarios.<n>We then categorize environmental noise into two primary types: user-noise and tool-noise.<n>Building on this analysis, we develop an automated pipeline that injects controllable noise into existing agent-centric benchmarks.
arXiv Detail & Related papers (2026-02-11T20:33:10Z) - Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z) - EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis [101.67583081810136]
Large language models (LLMs) are expected to be trained to act as agents in various real-world environments.<n>This process relies on rich and varied tool-interaction sandboxes.<n>We propose EnvScaler, an automated framework for scalable tool-interaction environments.
arXiv Detail & Related papers (2026-01-09T14:32:06Z) - AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning [71.4322853508083]
Conducting reinforcement learning in simulated environments offers a cost-effective and highly scalable way to enhance language-based agents.<n>Previous work has been limited to semi-automated environment synthesis or tasks lacking sufficient difficulty, offering little breadth or depth.<n>We propose a unified pipeline for automated and scalable synthesis of simulated environments associated with high-difficulty but easily verifiable tasks.
arXiv Detail & Related papers (2025-12-28T09:43:11Z) - CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL [35.086788669916594]
Large language model based agents are increasingly deployed in complex, tool augmented environments.<n>Existing approaches typically assume predefined task collections, an assumption that fails in novel environments.<n>We propose CuES, a Curiosity driven and Environment grounded Synthesis framework that autonomously generates diverse, executable, and meaningful tasks.
arXiv Detail & Related papers (2025-12-01T06:11:37Z) - Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey [30.673419015614233]
A growing consensus is that agents should interact directly with environments and learn from experience through reinforcement learning.<n>We formalize this iterative process as the Generation-Execution-Feedback (GEF) loop, where environments generate tasks to challenge agents, return observations in response to agents' actions during task execution, and provide evaluative feedback on rollouts for subsequent learning.<n>Under this paradigm, environments function as indispensable producers of experiential data, highlighting the need to scale them toward greater complexity, realism, and interactivity.
arXiv Detail & Related papers (2025-11-12T12:56:25Z) - Towards General Agentic Intelligence via Environment Scaling [78.66355092082253]
Advanced agentic intelligence is a prerequisite for deploying Large Language Models in real-world applications.<n>We design a scalable framework that automatically constructs heterogeneous environments that are fully simulated.<n>Experiments on agentic benchmarks, tau-bench, tau2-Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the function-calling capability of models.
arXiv Detail & Related papers (2025-09-16T17:57:20Z) - Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z) - Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions [50.5976989558411]
This paper investigates the faithfulness of multimodal large language model (MLLM) agents in a graphical user interface (GUI) environment.<n>A general scenario is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content.<n> Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions.
arXiv Detail & Related papers (2024-08-05T15:16:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.