Related papers: Evolving Curricula with Regret-Based Environment Design

Evolving Curricula with Regret-Based Environment Design

URL: http://arxiv.org/abs/2203.01302v3
Date: Sat, 30 Sep 2023 18:36:42 GMT
Title: Evolving Curricula with Regret-Based Environment Design
Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rockt\"aschel
Abstract summary: We propose to harness the power of evolution in a principled, regret-based curriculum. Our approach seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
Score: 37.70275057075986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at accelagent.github.io.

Related papers

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design [5.404569468550549]
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge.<n>We present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED)<n>TRACED yields curricula that improve zero-shot generalization across multiple benchmarks while requiring up to 2x fewer environment interactions than strong baselines.
arXiv Detail & Related papers (2025-06-24T20:29:24Z)
Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation [1.124958340749622]
We propose a framework that includes a reproduction module, similar to natural species reproduction, balancing diversity and specialization. By integrating RL, imitation learning (IL), and a coevolutionary agent-terrain curriculum, our system evolves agents continuously through complex tasks. Our initial experiments show that this method improves exploration efficiency and supports open-ended learning.
arXiv Detail & Related papers (2025-03-24T10:40:03Z)
Causally Aligned Curriculum Learning [69.11672390876763]
This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks. We develop an efficient algorithm to generate a causally aligned curriculum.
arXiv Detail & Related papers (2025-03-21T02:20:38Z)
Improving Environment Novelty Quantification for Effective Unsupervised Environment Design [7.973747521623636]
Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance. This paper introduces the Coverage-based Evaluation of Novelty In Environment (CENIE) framework.
arXiv Detail & Related papers (2025-02-08T23:59:41Z)
Learning Curricula in Open-Ended Worlds [17.138779075998084]
This thesis develops a class of methods called Unsupervised Environment Design (UED) Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments. The findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness.
arXiv Detail & Related papers (2023-12-03T16:44:00Z)
Enhancing the Hierarchical Environment Design via Generative Trajectory Modeling [8.256433006393243]
We introduce a hierarchical MDP framework for environment design under resource constraints. It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent. Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
arXiv Detail & Related papers (2023-09-30T08:21:32Z)
On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness [47.09873295916592]
Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal.
arXiv Detail & Related papers (2022-10-19T10:58:24Z)
Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world. Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning [45.52724876199729]
We present CARL, a collection of well-known RL environments extended to contextual RL problems. We provide first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization.
arXiv Detail & Related papers (2021-10-05T15:04:01Z)
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED) Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z)
Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning. The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior. Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier. understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.