Learning Curricula in Open-Ended Worlds
- URL: http://arxiv.org/abs/2312.03126v2
- Date: Fri, 8 Dec 2023 01:49:54 GMT
- Title: Learning Curricula in Open-Ended Worlds
- Authors: Minqi Jiang
- Abstract summary: This thesis develops a class of methods called Unsupervised Environment Design (UED)
Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments.
The findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness.
- Score: 17.138779075998084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (RL) provides powerful methods for training
optimal sequential decision-making agents. As collecting real-world
interactions can entail additional costs and safety risks, the common paradigm
of sim2real conducts training in a simulator, followed by real-world
deployment. Unfortunately, RL agents easily overfit to the choice of simulated
training environments, and worse still, learning ends when the agent masters
the specific set of simulated environments. In contrast, the real world is
highly open-ended, featuring endlessly evolving environments and challenges,
making such RL approaches unsuitable. Simply randomizing over simulated
environments is insufficient, as it requires making arbitrary distributional
assumptions and can be combinatorially less likely to sample specific
environment instances that are useful for learning. An ideal learning process
should automatically adapt the training environment to maximize the learning
potential of the agent over an open-ended task space that matches or surpasses
the complexity of the real world. This thesis develops a class of methods
called Unsupervised Environment Design (UED), which aim to produce such
open-ended processes. Given an environment design space, UED automatically
generates an infinite sequence or curriculum of training environments at the
frontier of the learning agent's capabilities. Through extensive empirical
studies and theoretical arguments founded on minimax-regret decision theory and
game theory, the findings in this thesis show that UED autocurricula can
produce RL agents exhibiting significantly improved robustness and
generalization to previously unseen environment instances. Such autocurricula
are promising paths toward open-ended learning systems that achieve more
general intelligence by continually generating and mastering additional
challenges of their own design.
Related papers
- Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity [10.402855891273346]
DIVA is an evolutionary approach for generating diverse training tasks in complex, open-ended simulators.
Our empirical results showcase DIVA's unique ability to overcome complex parameterizations and successfully train adaptive agent behavior.
arXiv Detail & Related papers (2024-11-07T06:27:12Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Staged Reinforcement Learning for Complex Tasks through Decomposed
Environments [4.883558259729863]
We discuss two methods that approximate RL problems to real problems.
In the context of traffic junction simulations, we demonstrate that, if we can decompose a complex task into multiple sub-tasks, solving these tasks first can be advantageous.
From a multi-agent perspective, we introduce a training structuring mechanism that exploits the use of experience learned under the popular paradigm called Centralised Training Decentralised Execution (CTDE)
arXiv Detail & Related papers (2023-11-05T19:43:23Z) - Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling [8.256433006393243]
We introduce a hierarchical MDP framework for environment design under resource constraints.
It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent.
Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
arXiv Detail & Related papers (2023-09-30T08:21:32Z) - Evolving Curricula with Regret-Based Environment Design [37.70275057075986]
We propose to harness the power of evolution in a principled, regret-based curriculum.
Our approach seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
arXiv Detail & Related papers (2022-03-02T18:40:00Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Scenic4RL: Programmatic Modeling and Generation of Reinforcement
Learning Environments [89.04823188871906]
Generation of diverse realistic scenarios is challenging for real-time strategy (RTS) environments.
Most of the existing simulators rely on randomly generating the environments.
We introduce the benefits of adopting an existing formal scenario specification language, SCENIC, to assist researchers.
arXiv Detail & Related papers (2021-06-18T21:49:46Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - Environment Shaping in Reinforcement Learning using State Abstraction [63.444831173608605]
We propose a novel framework of emphenvironment shaping using state abstraction.
Our key idea is to compress the environment's large state space with noisy signals to an abstracted space.
We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
arXiv Detail & Related papers (2020-06-23T17:00:22Z) - RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real [74.45688231140689]
We introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.
We obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning.
arXiv Detail & Related papers (2020-06-16T08:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.