Related papers: AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning

AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning

URL: http://arxiv.org/abs/2512.22857v1
Date: Sun, 28 Dec 2025 09:43:11 GMT
Title: AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning
Authors: Shihao Cai, Runnan Fang, Jialong Wu, Baixuan Li, Xinyu Wang, Yong Jiang, Liangcai Su, Liwen Zhang, Wenbiao Yin, Zhen Zhang, Fuli Feng, Pengjun Xie, Xiaobin Wang,
Abstract summary: Conducting reinforcement learning in simulated environments offers a cost-effective and highly scalable way to enhance language-based agents.<n>Previous work has been limited to semi-automated environment synthesis or tasks lacking sufficient difficulty, offering little breadth or depth.<n>We propose a unified pipeline for automated and scalable synthesis of simulated environments associated with high-difficulty but easily verifiable tasks.
Score: 71.4322853508083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conducting reinforcement learning (RL) in simulated environments offers a cost-effective and highly scalable way to enhance language-based agents. However, previous work has been limited to semi-automated environment synthesis or tasks lacking sufficient difficulty, offering little breadth or depth. In addition, the instability of simulated users integrated into these environments, along with the heterogeneity across simulated environments, poses further challenges for agentic RL. In this work, we propose: (1) a unified pipeline for automated and scalable synthesis of simulated environments associated with high-difficulty but easily verifiable tasks; and (2) an environment level RL algorithm that not only effectively mitigates user instability but also performs advantage estimation at the environment level, thereby improving training efficiency and stability. Comprehensive evaluations on agentic benchmarks, including tau-bench, tau2-Bench, and VitaBench, validate the effectiveness of our proposed method. Further in-depth analyses underscore its out-of-domain generalization.

Related papers

AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition [72.24180896265192]
We introduce AgentNoiseBench, a framework for evaluating robustness of agentic models under noisy environments.<n>We first conduct an in-depth analysis of biases and uncertainties in real-world scenarios.<n>We then categorize environmental noise into two primary types: user-noise and tool-noise.<n>Building on this analysis, we develop an automated pipeline that injects controllable noise into existing agent-centric benchmarks.
arXiv Detail & Related papers (2026-02-11T20:33:10Z)
Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation [57.65688895630163]
We introduce ACuRL, an Autonomous Curriculum Reinforcement Learning framework that continually adapts agents to specific environments with zero human data.<n>Our method effectively enables both intra-environment and cross-environment continual learning, yielding 4-22% performance gains without forgetting existing environments.
arXiv Detail & Related papers (2026-02-10T23:06:02Z)
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z)
ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training [34.682505898865884]
We introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch.<n>By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks.
arXiv Detail & Related papers (2026-02-06T16:05:55Z)
Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z)
Simulating Environments with Reasoning Models for Agent Training [55.98861707136674]
Building bespoke environments for training is heavy, brittle, and limits progress.<n>We propose two frameworks: Simia-SFT and Simia-RL.<n>Simia-SFT and Simia-RL enable scalable agent training without environment engineering.
arXiv Detail & Related papers (2025-11-03T18:29:57Z)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
Mind the Gap: Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning [15.619925926862235]
GAP is a generalizable autonomous pentesting framework.<n>It aims to realizes efficient policy training in realistic environments.<n>It also trains agents capable of drawing inferences about other cases from one instance.
arXiv Detail & Related papers (2024-12-05T11:24:27Z)
A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z)
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification [22.241676350331968]
This study focuses on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
arXiv Detail & Related papers (2022-11-07T10:18:31Z)
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED) Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.