Large-Scale Retrieval for Reinforcement Learning
- URL: http://arxiv.org/abs/2206.05314v1
- Date: Fri, 10 Jun 2022 18:25:30 GMT
- Title: Large-Scale Retrieval for Reinforcement Learning
- Authors: Peter C. Humphreys, Arthur Guez, Olivier Tieleman, Laurent Sifre,
Th\'eophane Weber, Timothy Lillicrap
- Abstract summary: In reinforcement learning, the dominant paradigm is for an agent to amortise information that helps decision-making into its network weights.
Here, we pursue an alternative approach in which agents can utilise large-scale context-sensitive database lookups to support their parametric computations.
- Score: 15.372742113152233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective decision making involves flexibly relating past experiences and
relevant contextual information to a novel situation. In deep reinforcement
learning, the dominant paradigm is for an agent to amortise information that
helps decision-making into its network weights via gradient descent on training
losses. Here, we pursue an alternative approach in which agents can utilise
large-scale context-sensitive database lookups to support their parametric
computations. This allows agents to directly learn in an end-to-end manner to
utilise relevant information to inform their outputs. In addition, new
information can be attended to by the agent, without retraining, by simply
augmenting the retrieval dataset. We study this approach in Go, a challenging
game for which the vast combinatorial state space privileges generalisation
over direct matching to past experiences. We leverage fast, approximate nearest
neighbor techniques in order to retrieve relevant data from a set of tens of
millions of expert demonstration states. Attending to this information provides
a significant boost to prediction accuracy and game-play performance over
simply using these demonstrations as training trajectories, providing a
compelling demonstration of the value of large-scale retrieval in reinforcement
learning agents.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Adaptive Memory Replay for Continual Learning [29.333341368722653]
Updating Foundation Models as new data becomes available can lead to catastrophic forgetting'
We introduce a framework of adaptive memory replay for continual learning, where sampling of past data is phrased as a multi-armed bandit problem.
We demonstrate the effectiveness of our approach, which maintains high performance while reducing forgetting by up to 10% at no training efficiency cost.
arXiv Detail & Related papers (2024-04-18T22:01:56Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Membership Inference Attacks via Adversarial Examples [5.721380617450644]
Membership inference attacks are a novel direction of research which aims at recovering training data used by a learning algorithm.
We develop a mean to measure the leakage of training data leveraging a quantity appearing as a proxy of the total variation of a trained model.
arXiv Detail & Related papers (2022-07-27T15:10:57Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - Adversarial Training Helps Transfer Learning via Better Representations [17.497590668804055]
Transfer learning aims to leverage models pre-trained on source data to efficiently adapt to target setting.
Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains.
We show that adversarial training in the source data generates provably better representations, so fine-tuning on top of this representation leads to a more accurate predictor of the target data.
arXiv Detail & Related papers (2021-06-18T15:41:07Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Self-Supervised Contrastive Learning for Efficient User Satisfaction
Prediction in Conversational Agents [35.2098736872247]
We propose a self-supervised contrastive learning approach to learn user-agent interactions.
We show that the pre-trained models using the self-supervised objective are transferable to the user satisfaction prediction.
We also propose a novel few-shot transfer learning approach that ensures better transferability for very small sample sizes.
arXiv Detail & Related papers (2020-10-21T18:10:58Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Privileged Information Dropout in Reinforcement Learning [56.82218103971113]
Using privileged information during training can improve the sample efficiency and performance of machine learning systems.
In this work, we investigate Privileged Information Dropout (pid) for achieving the latter which can be applied equally to value-based and policy-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-05-19T05:32:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.