Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design
- URL: http://arxiv.org/abs/2012.02096v2
- Date: Thu, 4 Feb 2021 03:01:31 GMT
- Title: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design
- Authors: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen,
Stuart Russell, Andrew Critch, Sergey Levine
- Abstract summary: We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
- Score: 121.73425076217471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A wide range of reinforcement learning (RL) problems - including robustness,
transfer learning, unsupervised RL, and emergent complexity - require
specifying a distribution of tasks or environments in which a policy will be
trained. However, creating a useful distribution of environments is error
prone, and takes a significant amount of developer time and effort. We propose
Unsupervised Environment Design (UED) as an alternative paradigm, where
developers provide environments with unknown parameters, and these parameters
are used to automatically produce a distribution over valid, solvable
environments. Existing approaches to automatically generating environments
suffer from common failure modes: domain randomization cannot generate
structure or adapt the difficulty of the environment to the agent's learning
progress, and minimax adversarial training leads to worst-case environments
that are often unsolvable. To generate structured, solvable environments for
our protagonist agent, we introduce a second, antagonist agent that is allied
with the environment-generating adversary. The adversary is motivated to
generate environments which maximize regret, defined as the difference between
the protagonist and antagonist agent's return. We call our technique
Protagonist Antagonist Induced Regret Environment Design (PAIRED). Our
experiments demonstrate that PAIRED produces a natural curriculum of
increasingly complex environments, and PAIRED agents achieve higher zero-shot
transfer performance when tested in highly novel environments.
Related papers
- Adversarial Environment Design via Regret-Guided Diffusion Models [13.651184780336623]
Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning.
Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities.
We propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD)
arXiv Detail & Related papers (2024-10-25T17:35:03Z) - No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions [68.92637077909693]
This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment.
A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content.
Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions.
arXiv Detail & Related papers (2024-08-05T15:16:22Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling [8.256433006393243]
We introduce a hierarchical MDP framework for environment design under resource constraints.
It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent.
Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
arXiv Detail & Related papers (2023-09-30T08:21:32Z) - Reward-Free Curricula for Training Robust World Models [37.13175950264479]
Learning world models from reward-free exploration is a promising approach, and enables policies to be trained using imagined experience for new tasks.
We address the novel problem of generating curricula in the reward-free setting to train robust world models.
We show that minimax regret can be connected to minimising the maximum error in the world model across environment instances.
This result informs our algorithm, WAKER: Weighted Acquisition of Knowledge across Environments for Robustness.
arXiv Detail & Related papers (2023-06-15T15:40:04Z) - Improving adaptability to new environments and removing catastrophic
forgetting in Reinforcement Learning by using an eco-system of agents [3.5786621294068373]
Adapting a Reinforcement Learning (RL) agent to an unseen environment is a difficult task due to typical over-fitting on the training environment.
There is a risk of catastrophic forgetting, where the performance on previously seen environments is seriously hampered.
This paper proposes a novel approach that exploits an ecosystem of agents to address both concerns.
arXiv Detail & Related papers (2022-04-13T17:52:54Z) - Heterogeneous Multi-Agent Reinforcement Learning for Unknown Environment
Mapping [0.0]
We present an actor-critic algorithm that allows a team of heterogeneous agents to learn decentralized control policies for covering an unknown environment.
This task is of interest to national security and emergency response organizations that would like to enhance situational awareness in hazardous areas by deploying teams of unmanned aerial vehicles.
arXiv Detail & Related papers (2020-10-06T12:23:05Z) - Environment Shaping in Reinforcement Learning using State Abstraction [63.444831173608605]
We propose a novel framework of emphenvironment shaping using state abstraction.
Our key idea is to compress the environment's large state space with noisy signals to an abstracted space.
We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
arXiv Detail & Related papers (2020-06-23T17:00:22Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.