Large Batch Experience Replay
- URL: http://arxiv.org/abs/2110.01528v1
- Date: Mon, 4 Oct 2021 15:53:13 GMT
- Title: Large Batch Experience Replay
- Authors: Thibault Lahire, Matthieu Geist, Emmanuel Rachelson
- Abstract summary: We introduce new theoretical foundations of Prioritized Experience Replay.
LaBER is an easy-to-code and efficient method for sampling the replay buffer.
- Score: 22.473676537463607
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Several algorithms have been proposed to sample non-uniformly the replay
buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but
very few theoretical foundations of these sampling schemes have been provided.
Among others, Prioritized Experience Replay appears as a hyperparameter
sensitive heuristic, even though it can provide good performance. In this work,
we cast the replay buffer sampling problem as an importance sampling one for
estimating the gradient. This allows deriving the theoretically optimal
sampling distribution, yielding the best theoretical convergence speed.
Elaborating on the knowledge of the ideal sampling scheme, we exhibit new
theoretical foundations of Prioritized Experience Replay. The optimal sampling
distribution being intractable, we make several approximations providing good
results in practice and introduce, among others, LaBER (Large Batch Experience
Replay), an easy-to-code and efficient method for sampling the replay buffer.
LaBER, which can be combined with Deep Q-Networks, distributional RL agents or
actor-critic methods, yields improved performance over a diverse range of Atari
games and PyBullet environments, compared to the base agent it is implemented
on and to other prioritization schemes.
Related papers
- Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement
Learning Using Unique Experiences [8.983448736644382]
Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms.
We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer.
arXiv Detail & Related papers (2024-02-05T10:04:00Z) - An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode
Discovery in GFlowNets [47.82697599507171]
Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$.
GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$.
arXiv Detail & Related papers (2023-07-15T01:17:14Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - MAC-PO: Multi-Agent Experience Replay via Collective Priority
Optimization [12.473095790918347]
We propose name, which formulates optimal prioritized experience replay for multi-agent problems.
By minimizing the resulting policy regret, we can narrow the gap between the current policy and a nominal optimal policy.
arXiv Detail & Related papers (2023-02-21T03:11:21Z) - Event Tables for Efficient Experience Replay [31.678826875509348]
Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems.
Uniform sampling from an ER buffer can lead to slow convergence and unstable behaviors.
This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables.
arXiv Detail & Related papers (2022-11-01T16:38:23Z) - Sampling Through the Lens of Sequential Decision Making [9.101505546901999]
We propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR)
Our approach optimally adjusts the sampling process to achieve optimal performance.
Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets.
arXiv Detail & Related papers (2022-08-17T04:01:29Z) - Analysis of Stochastic Processes through Replay Buffers [50.52781475688759]
We analyze a system where a process X is pushed into a replay buffer and then randomly generates a process Y from the replay buffer.
Our theoretical analysis sheds light on why replay buffer may be a good de-correlator.
arXiv Detail & Related papers (2022-06-26T11:20:44Z) - Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning.
We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z) - Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay.
We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z) - Learning to Sample with Local and Global Contexts in Experience Replay
Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition.
We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.