Learning General World Models in a Handful of Reward-Free Deployments
- URL: http://arxiv.org/abs/2210.12719v1
- Date: Sun, 23 Oct 2022 12:38:03 GMT
- Title: Learning General World Models in a Handful of Reward-Free Deployments
- Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh
Rybkin, Stephen J. Roberts, Tim Rockt\"aschel, Edward Grefenstette
- Abstract summary: Building generally capable agents is a grand challenge for deep reinforcement learning (RL)
We present CASCADE, a novel approach for self-supervised exploration in this new setting.
We show that CASCADE collects diverse task-agnostic datasets and learns agents that zero-shot to novel, unseen downstream tasks.
- Score: 53.06205037827802
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Building generally capable agents is a grand challenge for deep reinforcement
learning (RL). To approach this challenge practically, we outline two key
desiderata: 1) to facilitate generalization, exploration should be task
agnostic; 2) to facilitate scalability, exploration policies should collect
large quantities of data without costly centralized retraining. Combining these
two properties, we introduce the reward-free deployment efficiency setting, a
new paradigm for RL research. We then present CASCADE, a novel approach for
self-supervised exploration in this new setting. CASCADE seeks to learn a world
model by collecting data with a population of agents, using an information
theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this
by specifically maximizing the diversity of trajectories sampled by the
population through a novel cascading objective. We provide theoretical
intuition for CASCADE which we show in a tabular setting improves upon na\"ive
approaches that do not account for population diversity. We then demonstrate
that CASCADE collects diverse task-agnostic datasets and learns agents that
generalize zero-shot to novel, unseen downstream tasks on Atari, MiniGrid,
Crafter and the DM Control Suite. Code and videos are available at
https://ycxuyingchen.github.io/cascade/
Related papers
- A Survey on Deep Active Learning: Recent Advances and New Frontiers [27.07154361976248]
This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in deep learning-based active learning (DAL)
This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce.
arXiv Detail & Related papers (2024-05-01T05:54:33Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - Knowledge Transfer-Driven Few-Shot Class-Incremental Learning [23.163459923345556]
Few-shot class-incremental learning (FSCIL) aims to continually learn new classes using a few samples while not forgetting the old classes.
Despite the advance of existing FSCIL methods, the proposed knowledge transfer learning schemes are sub-optimal due to the insufficient optimization for the model's plasticity.
We propose a Random Episode Sampling and Augmentation (RESA) strategy that relies on diverse pseudo incremental tasks as agents to achieve the knowledge transfer.
arXiv Detail & Related papers (2023-06-19T14:02:45Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Model-Free Generative Replay for Lifelong Reinforcement Learning:
Application to Starcraft-2 [5.239932780277599]
Generative replay (GR) is a biologically-inspired replay mechanism that augments learning experiences with self-labelled examples.
We present a version of GR for LRL that satisfies two desideratas: (a) Introspective density modelling of the latent representations of policies learned using deep RL, and (b) Model-free end-to-end learning.
arXiv Detail & Related papers (2022-08-09T22:00:28Z) - The Challenges of Exploration for Offline Reinforcement Learning [8.484491887821473]
We study the two processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.
The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest.
We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL.
arXiv Detail & Related papers (2022-01-27T23:59:56Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.