Stabilizing Unsupervised Environment Design with a Learned Adversary
- URL: http://arxiv.org/abs/2308.10797v2
- Date: Tue, 22 Aug 2023 14:38:43 GMT
- Title: Stabilizing Unsupervised Environment Design with a Learned Adversary
- Authors: Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis,
Eugene Vinitsky, Tim Rockt\"aschel
- Abstract summary: Key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations.
A pioneering approach for Unsupervised Environment Design (UED) is PAIRED, which uses reinforcement learning to train a teacher policy to design tasks from scratch.
Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance.
We make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments.
- Score: 28.426666219969555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge in training generally-capable agents is the design of
training tasks that facilitate broad generalization and robustness to
environment variations. This challenge motivates the problem setting of
Unsupervised Environment Design (UED), whereby a student agent trains on an
adaptive distribution of tasks proposed by a teacher agent. A pioneering
approach for UED is PAIRED, which uses reinforcement learning (RL) to train a
teacher policy to design tasks from scratch, making it possible to directly
generate tasks that are adapted to the agent's current capabilities. Despite
its strong theoretical backing, PAIRED suffers from a variety of challenges
that hinder its practical performance. Thus, state-of-the-art methods currently
rely on curation and mutation rather than generation of new tasks. In this
work, we investigate several key shortcomings of PAIRED and propose solutions
for each shortcoming. As a result, we make it possible for PAIRED to match or
exceed state-of-the-art methods, producing robust agents in several established
challenging procedurally-generated environments, including a partially-observed
maze navigation task and a continuous-control car racing environment. We
believe this work motivates a renewed emphasis on UED methods based on learned
models that directly generate challenging environments, potentially unlocking
more open-ended RL training and, as a result, more general agents.
Related papers
- Adversarial Environment Design via Regret-Guided Diffusion Models [13.651184780336623]
Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning.
Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities.
We propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD)
arXiv Detail & Related papers (2024-10-25T17:35:03Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling [8.256433006393243]
We introduce a hierarchical MDP framework for environment design under resource constraints.
It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent.
Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
arXiv Detail & Related papers (2023-09-30T08:21:32Z) - CLUTR: Curriculum Learning via Unsupervised Task Representation Learning [130.79246770546413]
CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization.
We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
arXiv Detail & Related papers (2022-10-19T01:45:29Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Adaptive Procedural Task Generation for Hard-Exploration Problems [78.20918366839399]
We introduce Adaptive Procedural Task Generation (APT-Gen) to facilitate reinforcement learning in hard-exploration problems.
At the heart of our approach is a task generator that learns to create tasks from a parameterized task space via a black-box procedural generation module.
To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks.
arXiv Detail & Related papers (2020-07-01T09:38:51Z) - Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals.
We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.