Related papers: Retrieval-Augmented Reinforcement Learning

Retrieval-Augmented Reinforcement Learning

URL: http://arxiv.org/abs/2202.08417v1
Date: Thu, 17 Feb 2022 02:44:05 GMT
Title: Retrieval-Augmented Reinforcement Learning
Authors: Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Ksenia Konyushkova, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell
Abstract summary: We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
Score: 63.32076191982944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.

Related papers

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning [0.0]
Transfer Learning (TL) aims to reduce the learning complexity for an agent dealing with an unfamiliar task. It enables the use of external knowledge from other tasks or agents to enhance a learning process. This is achieved by lowering the amount of new information required by its learning model, resulting in a reduced overall convergence time.
arXiv Detail & Related papers (2025-01-26T11:53:18Z)
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z)
Semifactual Explanations for Reinforcement Learning [1.5320737596132754]
Reinforcement Learning (RL) is a learning paradigm in which the agent learns from its environment through trial and error. Deep reinforcement learning (DRL) algorithms represent the agent's policies using neural networks, making their decisions difficult to interpret. Explaining the behaviour of DRL agents is necessary to advance user trust, increase engagement, and facilitate integration with real-life tasks.
arXiv Detail & Related papers (2024-09-09T08:37:47Z)
TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents [10.798271657186492]
This paper advocates Trajdeleter, the first practical approach to trajectory unlearning for offline RL agents. The key idea of Trajdeleter is to guide the agent to demonstrate deteriorating performance when it encounters states associated with unlearning trajectories. Extensive experiments conducted on six offline RL algorithms and three tasks demonstrate that Trajdeleter requires only about 1.5% of the time needed for retraining from scratch.
arXiv Detail & Related papers (2024-04-18T22:23:24Z)
Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning [23.638418776700522]
We propose a new model named the causal decision transformer for recommender systems (CDT4Rec) CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction. To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
arXiv Detail & Related papers (2023-04-17T00:05:52Z)
Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online. We extensively ablate these design choices, demonstrating the key factors that most affect performance. We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error. Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience. Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval [19.723551683930776]
Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacting. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training.
arXiv Detail & Related papers (2020-06-05T00:38:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.