Reward-Free Curricula for Training Robust World Models
- URL: http://arxiv.org/abs/2306.09205v2
- Date: Wed, 24 Jan 2024 18:32:49 GMT
- Title: Reward-Free Curricula for Training Robust World Models
- Authors: Marc Rigter, Minqi Jiang, Ingmar Posner
- Abstract summary: Learning world models from reward-free exploration is a promising approach, and enables policies to be trained using imagined experience for new tasks.
We address the novel problem of generating curricula in the reward-free setting to train robust world models.
We show that minimax regret can be connected to minimising the maximum error in the world model across environment instances.
This result informs our algorithm, WAKER: Weighted Acquisition of Knowledge across Environments for Robustness.
- Score: 37.13175950264479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been a recent surge of interest in developing generally-capable
agents that can adapt to new tasks without additional training in the
environment. Learning world models from reward-free exploration is a promising
approach, and enables policies to be trained using imagined experience for new
tasks. However, achieving a general agent requires robustness across different
environments. In this work, we address the novel problem of generating
curricula in the reward-free setting to train robust world models. We consider
robustness in terms of minimax regret over all environment instantiations and
show that the minimax regret can be connected to minimising the maximum error
in the world model across environment instances. This result informs our
algorithm, WAKER: Weighted Acquisition of Knowledge across Environments for
Robustness. WAKER selects environments for data collection based on the
estimated error of the world model for each environment. Our experiments
demonstrate that WAKER outperforms several baselines, resulting in improved
robustness, efficiency, and generalisation.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Learning Curricula in Open-Ended Worlds [17.138779075998084]
This thesis develops a class of methods called Unsupervised Environment Design (UED)
Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments.
The findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness.
arXiv Detail & Related papers (2023-12-03T16:44:00Z) - Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling [8.256433006393243]
We introduce a hierarchical MDP framework for environment design under resource constraints.
It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent.
Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
arXiv Detail & Related papers (2023-09-30T08:21:32Z) - Leveraging World Model Disentanglement in Value-Based Multi-Agent
Reinforcement Learning [18.651307543537655]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.
We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
arXiv Detail & Related papers (2023-09-08T22:12:43Z) - OPEn: An Open-ended Physics Environment for Learning Without a Task [132.6062618135179]
We will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks.
We build a benchmark Open-ended Physics ENvironment (OPEn) and also design several tasks to test learning representations in this environment explicitly.
We find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results.
arXiv Detail & Related papers (2021-10-13T17:48:23Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.