Model-Free Generative Replay for Lifelong Reinforcement Learning:
Application to Starcraft-2
- URL: http://arxiv.org/abs/2208.05056v1
- Date: Tue, 9 Aug 2022 22:00:28 GMT
- Title: Model-Free Generative Replay for Lifelong Reinforcement Learning:
Application to Starcraft-2
- Authors: Zachary Daniels, Aswin Raghavan, Jesse Hostetler, Abrar Rahman,
Indranil Sur, Michael Piacentino, Ajay Divakaran
- Abstract summary: Generative replay (GR) is a biologically-inspired replay mechanism that augments learning experiences with self-labelled examples.
We present a version of GR for LRL that satisfies two desideratas: (a) Introspective density modelling of the latent representations of policies learned using deep RL, and (b) Model-free end-to-end learning.
- Score: 5.239932780277599
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: One approach to meet the challenges of deep lifelong reinforcement learning
(LRL) is careful management of the agent's learning experiences, in order to
learn (without forgetting) and build internal meta-models (of the tasks,
environments, agents, and world). Generative replay (GR) is a
biologically-inspired replay mechanism that augments learning experiences with
self-labelled examples drawn from an internal generative model that is updated
over time. In this paper, we present a version of GR for LRL that satisfies two
desiderata: (a) Introspective density modelling of the latent representations
of policies learned using deep RL, and (b) Model-free end-to-end learning. In
this work, we study three deep learning architectures for model-free GR. We
evaluate our proposed algorithms on three different scenarios comprising tasks
from the StarCraft2 and Minigrid domains. We report several key findings
showing the impact of the design choices on quantitative metrics that include
transfer learning, generalization to unseen tasks, fast adaptation after task
change, performance comparable to a task expert, and minimizing catastrophic
forgetting. We observe that our GR prevents drift in the features-to-action
mapping from the latent vector space of a deep actor-critic agent. We also show
improvements in established lifelong learning metrics. We find that the
introduction of a small random replay buffer is needed to significantly
increase the stability of training, when used in conjunction with the replay
buffer and the generated replay buffer. Overall, we find that "hidden replay"
(a well-known architecture for class-incremental classification) is the most
promising approach that pushes the state-of-the-art in GR for LRL.
Related papers
- PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators [2.334978724544296]
Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained.
PCGRL offers a unique set of affordances for game designers, but it is constrained by the compute-intensive process of training RL agents.
We implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU.
arXiv Detail & Related papers (2024-08-22T16:30:24Z) - RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback [24.759613248409167]
Reward engineering has long been a challenge in Reinforcement Learning research.
We propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks.
We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains.
arXiv Detail & Related papers (2024-02-06T04:06:06Z) - Augmenting Replay in World Models for Continual Reinforcement Learning [0.0]
Continual RL requires an agent to learn new tasks without forgetting previous ones, while improving on both past and future tasks.
The most common approaches use model-free algorithms and replay buffers to mitigate catastrophic forgetting.
We introduce WMAR (World Models with Augmented Replay), a model-based RL algorithm with a memory-efficient replay buffer.
arXiv Detail & Related papers (2024-01-30T00:48:26Z) - OER: Offline Experience Replay for Continual Offline Reinforcement Learning [25.985985377992034]
Continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent.
In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks.
We propose a new model-based experience selection scheme to build the replay buffer, where a transition model is learned to approximate the state distribution.
arXiv Detail & Related papers (2023-05-23T08:16:44Z) - Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong
Learning in Task-Oriented Dialogue [80.05509768165135]
generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples.
Most existing generative replay methods use only a single task-specific token to control their models.
We propose a novel method, prompt conditioned VAE for lifelong learning, to enhance generative replay by incorporating tasks' statistics.
arXiv Detail & Related papers (2022-10-14T13:12:14Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.