Retrieval-Augmented Reinforcement Learning
- URL: http://arxiv.org/abs/2202.08417v1
- Date: Thu, 17 Feb 2022 02:44:05 GMT
- Title: Retrieval-Augmented Reinforcement Learning
- Authors: Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan
Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Ksenia
Konyushkova, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess,
Charles Blundell
- Abstract summary: We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
- Score: 63.32076191982944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most deep reinforcement learning (RL) algorithms distill experience into
parametric behavior policies or value functions via gradient updates. While
effective, this approach has several disadvantages: (1) it is computationally
expensive, (2) it can take many updates to integrate experiences into the
parametric model, (3) experiences that are not fully integrated do not
appropriately influence the agent's behavior, and (4) behavior is limited by
the capacity of the model. In this paper we explore an alternative paradigm in
which we train a network to map a dataset of past experiences to optimal
behavior. Specifically, we augment an RL agent with a retrieval process
(parameterized as a neural network) that has direct access to a dataset of
experiences. This dataset can come from the agent's past experiences, expert
demonstrations, or any other relevant source. The retrieval process is trained
to retrieve information from the dataset that may be useful in the current
context, to help the agent achieve its goal faster and more efficiently. We
integrate our method into two different RL agents: an offline DQN agent and an
online R2D2 agent. In offline multi-task problems, we show that the
retrieval-augmented DQN agent avoids task interference and learns faster than
the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns
significantly faster than the baseline R2D2 agent and achieves higher scores.
We run extensive ablations to measure the contributions of the components of
our proposed method.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Semifactual Explanations for Reinforcement Learning [1.5320737596132754]
Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error.
Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret.
Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks.
arXiv Detail & Related papers (2024-09-09T08:37:47Z) - TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents [10.798271657186492]
This paper advocates Trajdeleter, the first practical approach to trajectory unlearning for offline RL agents.
The key idea of Trajdeleter is to guide the agent to demonstrate deteriorating performance when it encounters states associated with unlearning trajectories.
Extensive experiments conducted on six offline RL algorithms and three tasks demonstrate that Trajdeleter requires only about 1.5% of the time needed for retraining from scratch.
arXiv Detail & Related papers (2024-04-18T22:23:24Z) - Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning [23.638418776700522]
We propose a new model named the causal decision transformer for recommender systems (CDT4Rec)
CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction.
To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
arXiv Detail & Related papers (2023-04-17T00:05:52Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Balancing Reinforcement Learning Training Experiences in Interactive
Information Retrieval [19.723551683930776]
Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacting.
To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents.
Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training.
arXiv Detail & Related papers (2020-06-05T00:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.