Evolving Curricula with Regret-Based Environment Design
- URL: http://arxiv.org/abs/2203.01302v3
- Date: Sat, 30 Sep 2023 18:36:42 GMT
- Title: Evolving Curricula with Regret-Based Environment Design
- Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan,
Jakob Foerster, Edward Grefenstette, Tim Rockt\"aschel
- Abstract summary: We propose to harness the power of evolution in a principled, regret-based curriculum.
Our approach seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
- Score: 37.70275057075986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It remains a significant challenge to train generally capable agents with
reinforcement learning (RL). A promising avenue for improving the robustness of
RL agents is through the use of curricula. One such class of methods frames
environment design as a game between a student and a teacher, using
regret-based objectives to produce environment instantiations (or levels) at
the frontier of the student agent's capabilities. These methods benefit from
their generality, with theoretical guarantees at equilibrium, yet they often
struggle to find effective levels in challenging design spaces. By contrast,
evolutionary approaches seek to incrementally alter environment complexity,
resulting in potentially open-ended learning, but often rely on domain-specific
heuristics and vast amounts of computational resources. In this paper we
propose to harness the power of evolution in a principled, regret-based
curriculum. Our approach, which we call Adversarially Compounding Complexity by
Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of
an agent's capabilities, resulting in curricula that start simple but become
increasingly complex. ACCEL maintains the theoretical benefits of prior
regret-based methods, while providing significant empirical gains in a diverse
set of environments. An interactive version of the paper is available at
accelagent.github.io.
Related papers
- Learning Curricula in Open-Ended Worlds [17.138779075998084]
This thesis develops a class of methods called Unsupervised Environment Design (UED)
Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments.
The findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness.
arXiv Detail & Related papers (2023-12-03T16:44:00Z) - Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling [8.256433006393243]
We introduce a hierarchical MDP framework for environment design under resource constraints.
It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent.
Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
arXiv Detail & Related papers (2023-09-30T08:21:32Z) - On the Power of Pre-training for Generalization in RL: Provable Benefits
and Hardness [47.09873295916592]
Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment.
This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful?
When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal.
arXiv Detail & Related papers (2022-10-19T10:58:24Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning [45.52724876199729]
We present CARL, a collection of well-known RL environments extended to contextual RL problems.
We provide first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization.
arXiv Detail & Related papers (2021-10-05T15:04:01Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.