Learning to Sample with Local and Global Contexts in Experience Replay
Buffer
- URL: http://arxiv.org/abs/2007.07358v2
- Date: Wed, 7 Apr 2021 15:10:58 GMT
- Title: Learning to Sample with Local and Global Contexts in Experience Replay
Buffer
- Authors: Youngmin Oh, Kimin Lee, Jinwoo Shin, Eunho Yang, and Sung Ju Hwang
- Abstract summary: We propose a new learning-based sampling method that can compute the relative importance of transition.
We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
- Score: 135.94190624087355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Experience replay, which enables the agents to remember and reuse experience
from the past, has played a significant role in the success of off-policy
reinforcement learning (RL). To utilize the experience replay efficiently, the
existing sampling methods allow selecting out more meaningful experiences by
imposing priorities on them based on certain metrics (e.g. TD-error). However,
they may result in sampling highly biased, redundant transitions since they
compute the sampling rate for each transition independently, without
consideration of its importance in relation to other transitions. In this
paper, we aim to address the issue by proposing a new learning-based sampling
method that can compute the relative importance of transition. To this end, we
design a novel permutation-equivariant neural architecture that takes contexts
from not only features of each transition (local) but also those of others
(global) as inputs. We validate our framework, which we refer to as Neural
Experience Replay Sampler (NERS), on multiple benchmark tasks for both
continuous and discrete control tasks and show that it can significantly
improve the performance of various off-policy RL methods. Further analysis
confirms that the improvements of the sample efficiency indeed are due to
sampling diverse and meaningful transitions by NERS that considers both local
and global contexts.
Related papers
- Prioritized Generative Replay [121.83947140497655]
We propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience.
This paradigm enables densification of past experience, with new generations that benefit from the generative model's generalization capacity.
We show this recipe can be instantiated using conditional diffusion models and simple relevance functions.
arXiv Detail & Related papers (2024-10-23T17:59:52Z) - BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms [5.331052581441265]
We develop a novel algorithm, Corrected Uniform Experience (CUER), which samples the stored experience while considering the fairness among all other experiences.
CUER provides promising improvements for off-policy continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.
arXiv Detail & Related papers (2024-06-13T12:03:40Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Sampling Through the Lens of Sequential Decision Making [9.101505546901999]
We propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR)
Our approach optimally adjusts the sampling process to achieve optimal performance.
Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets.
arXiv Detail & Related papers (2022-08-17T04:01:29Z) - Adaptive Client Sampling in Federated Learning via Online Learning with
Bandit Feedback [36.05851452151107]
federated learning (FL) systems need to sample a subset of clients that are involved in each round of training.
Despite its importance, there is limited work on how to sample clients effectively.
We show how our sampling method can improve the convergence speed of optimization algorithms.
arXiv Detail & Related papers (2021-12-28T23:50:52Z) - Fractional Transfer Learning for Deep Model-Based Reinforcement Learning [0.966840768820136]
Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks.
Recent progress in model-based RL allows agents to be much more data-efficient.
We present a simple alternative approach: fractional transfer learning.
arXiv Detail & Related papers (2021-08-14T12:44:42Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - How Transferable are the Representations Learned by Deep Q Agents? [13.740174266824532]
We consider the source of Deep Reinforcement Learning's sample complexity.
We compare the benefits of transfer learning to learning a policy from scratch.
We find that benefits due to transfer are highly variable in general and non-symmetric across pairs of tasks.
arXiv Detail & Related papers (2020-02-24T00:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.