Related papers: AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

URL: http://arxiv.org/abs/2511.19304v2
Date: Wed, 03 Dec 2025 07:47:10 GMT
Title: AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
Authors: Jiayi Zhang, Yiran Peng, Fanqi Kong, Cheng Yang, Yifan Wu, Zhaoyang Yu, Jinyu Xiang, Jianhao Ruan, Jinlin Wang, Maojia Song, HongZhang Liu, Xiangru Tang, Bang Liu, Chenglin Wu, Yuyu Luo,
Abstract summary: Cross-environment learning has remained largely unmeasured.<n>We propose AutoEnv, an automated framework that treats environments as factorizable distributions over transitions, observations, and rewards.<n>Using AutoEnv, we construct AutoEnv-36, a dataset of 36 environments with 358 validated levels, on which seven language models achieve 12-49% normalized reward.
Score: 43.356179862690745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving within a single domain, implicitly assuming a fixed environment distribution. Cross-environment learning has remained largely unmeasured: there is no standard collection of controllable, heterogeneous environments, nor a unified way to represent how agents learn. We address these gaps in two steps. First, we propose AutoEnv, an automated framework that treats environments as factorizable distributions over transitions, observations, and rewards, enabling low-cost (4.12 USD on average) generation of heterogeneous worlds. Using AutoEnv, we construct AutoEnv-36, a dataset of 36 environments with 358 validated levels, on which seven language models achieve 12-49% normalized reward, demonstrating the challenge of AutoEnv-36. Second, we formalize agent learning as a component-centric process driven by three stages of Selection, Optimization, and Evaluation applied to an improvable agent component. Using this formulation, we design eight learning methods and evaluate them on AutoEnv-36. Empirically, the gain of any single learning method quickly decrease as the number of environments increases, revealing that fixed learning methods do not scale across heterogeneous environments. Environment-adaptive selection of learning methods substantially improves performance but exhibits diminishing returns as the method space expands. These results highlight both the necessity and the current limitations of agent learning for scalable cross-environment generalization, and position AutoEnv and AutoEnv-36 as a testbed for studying cross-environment agent learning. The code is avaiable at https://github.com/FoundationAgents/AutoEnv.

Related papers

Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation [57.65688895630163]
We introduce ACuRL, an Autonomous Curriculum Reinforcement Learning framework that continually adapts agents to specific environments with zero human data.<n>Our method effectively enables both intra-environment and cross-environment continual learning, yielding 4-22% performance gains without forgetting existing environments.
arXiv Detail & Related papers (2026-02-10T23:06:02Z)
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning [62.499592503950026]
Large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments.<n>We propose Agent World Model (AWM), a fully synthetic environment generation pipeline.<n>We scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets.
arXiv Detail & Related papers (2026-02-10T18:55:41Z)
AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning [71.4322853508083]
Conducting reinforcement learning in simulated environments offers a cost-effective and highly scalable way to enhance language-based agents.<n>Previous work has been limited to semi-automated environment synthesis or tasks lacking sufficient difficulty, offering little breadth or depth.<n>We propose a unified pipeline for automated and scalable synthesis of simulated environments associated with high-difficulty but easily verifiable tasks.
arXiv Detail & Related papers (2025-12-28T09:43:11Z)
Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z)
RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning [25.438771583229727]
We propose Retrieval-Augmented Learning for Autonomous Driving (RALAD) to bridge the real-to-sim gap at a low cost.<n>RALAD features three primary designs, including (1) domain adaptation via an enhanced Optimal Transport (OT) method, (2) a simple and unified framework, and (3) efficient fine-tuning techniques.<n> Experimental results demonstrate that RALAD compensates for the performance degradation in simulated environments while maintaining accuracy in real-world scenarios.
arXiv Detail & Related papers (2025-01-21T17:03:06Z)
No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z)
Learning Environment Models with Continuous Stochastic Dynamics [0.0]
We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics. We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot.
arXiv Detail & Related papers (2023-06-29T12:47:28Z)
Generalization through Diversity: Improving Unsupervised Environment Design [8.961693126230452]
We propose a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design.
arXiv Detail & Related papers (2023-01-19T11:55:47Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.