Diversity Induced Environment Design via Self-Play
- URL: http://arxiv.org/abs/2302.02119v4
- Date: Tue, 25 Jul 2023 08:00:40 GMT
- Title: Diversity Induced Environment Design via Self-Play
- Authors: Dexun Li, Wenjun Li, Pradeep Varakantham
- Abstract summary: We propose a task-agnostic method to identify observed/hidden states that are representative of a given level.
The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance.
In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent.
- Score: 9.172096093540357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on designing an appropriate distribution of environments has
shown promise for training effective generally capable agents. Its success is
partly because of a form of adaptive curriculum learning that generates
environment instances (or levels) at the frontier of the agent's capabilities.
However, such an environment design framework often struggles to find effective
levels in challenging design spaces and requires costly interactions with the
environment. In this paper, we aim to introduce diversity in the Unsupervised
Environment Design (UED) framework. Specifically, we propose a task-agnostic
method to identify observed/hidden states that are representative of a given
level. The outcome of this method is then utilized to characterize the
diversity between two levels, which as we show can be crucial to effective
performance. In addition, to improve sampling efficiency, we incorporate the
self-play technique that allows the environment generator to automatically
generate environments that are of great benefit to the training agent.
Quantitatively, our approach, Diversity-induced Environment Design via
Self-Play (DivSP), shows compelling performance over existing methods.
Related papers
- Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling [8.256433006393243]
We introduce a hierarchical MDP framework for environment design under resource constraints.
It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent.
Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
arXiv Detail & Related papers (2023-09-30T08:21:32Z) - Free Lunch for Domain Adversarial Training: Environment Label Smoothing [82.85757548355566]
We propose Environment Label Smoothing (ELS) to improve training stability, local convergence, and robustness to noisy environment labels.
We yield state-of-art results on a wide range of domain generalization/adaptation tasks, particularly when the environment labels are highly noisy.
arXiv Detail & Related papers (2023-02-01T02:55:26Z) - Generalization through Diversity: Improving Unsupervised Environment
Design [8.961693126230452]
We propose a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design.
We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design.
arXiv Detail & Related papers (2023-01-19T11:55:47Z) - Environment Design for Inverse Reinforcement Learning [3.085995273374333]
Current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics.
In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function.
This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
arXiv Detail & Related papers (2022-10-26T18:31:17Z) - Stateful active facilitator: Coordination and Environmental
Heterogeneity in Cooperative Multi-Agent Reinforcement Learning [71.53769213321202]
We formalize the notions of coordination level and heterogeneity level of an environment.
We present HECOGrid, a suite of multi-agent environments that facilitates empirical evaluation of different MARL approaches.
We propose a Training Decentralized Execution learning approach that enables agents to work efficiently in high-coordination and high-heterogeneity environments.
arXiv Detail & Related papers (2022-10-04T18:17:01Z) - Environment Optimization for Multi-Agent Navigation [11.473177123332281]
The goal of this paper is to consider the environment as a decision variable in a system-level optimization problem.
We show, through formal proofs, under which conditions the environment can change while guaranteeing completeness.
In order to accommodate a broad range of implementation scenarios, we include both online and offline optimization, and both discrete and continuous environment representations.
arXiv Detail & Related papers (2022-09-22T19:22:16Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - One Solution is Not All You Need: Few-Shot Extrapolation via Structured
MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments.
By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.