Replay For Safety
- URL: http://arxiv.org/abs/2112.04229v1
- Date: Wed, 8 Dec 2021 11:10:57 GMT
- Title: Replay For Safety
- Authors: Liran Szlak, Ohad Shamir
- Abstract summary: In experience replay, past transitions are stored in a memory buffer and re-used during learning.
We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
- Score: 51.11953997546418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Experience replay \citep{lin1993reinforcement, mnih2015human} is a widely
used technique to achieve efficient use of data and improved performance in RL
algorithms. In experience replay, past transitions are stored in a memory
buffer and re-used during learning. Various suggestions for sampling schemes
from the replay buffer have been suggested in previous works, attempting to
optimally choose those experiences which will most contribute to the
convergence to an optimal policy. Here, we give some conditions on the replay
sampling scheme that will ensure convergence, focusing on the well-known
Q-learning algorithm in the tabular setting. After establishing sufficient
conditions for convergence, we turn to suggest a slightly different usage for
experience replay - replaying memories in a biased manner as a means to change
the properties of the resulting policy. We initiate a rigorous study of
experience replay as a tool to control and modify the properties of the
resulting policy. In particular, we show that using an appropriate biased
sampling scheme can allow us to achieve a \emph{safe} policy. We believe that
using experience replay as a biasing mechanism that allows controlling the
resulting policy in desirable ways is an idea with promising potential for many
applications.
Related papers
- Prioritized Generative Replay [121.83947140497655]
We propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience.
This paradigm enables densification of past experience, with new generations that benefit from the generative model's generalization capacity.
We show this recipe can be instantiated using conditional diffusion models and simple relevance functions.
arXiv Detail & Related papers (2024-10-23T17:59:52Z) - Variance Reduction based Experience Replay for Policy Optimization [3.0657293044976894]
We propose a general variance reduction based experience replay (VRER) framework that can selectively reuse the most relevant samples to improve policy gradient estimation.
Our theoretical and empirical studies show that the proposed VRER can accelerate the learning of optimal policy and enhance the performance of state-of-the-art policy optimization approaches.
arXiv Detail & Related papers (2022-08-25T20:51:00Z) - Neighborhood Mixup Experience Replay: Local Convex Interpolation for
Improved Sample Efficiency in Continuous Control Tasks [60.88792564390274]
Neighborhood Mixup Experience Replay (NMER) is a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space.
We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers.
arXiv Detail & Related papers (2022-05-18T02:44:08Z) - Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay.
We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z) - Variance Reduction based Experience Replay for Policy Optimization [3.0790370651488983]
Variance Reduction Experience Replay (VRER) is a framework for the selective reuse of relevant samples to improve policy gradient estimation.
VRER forms the foundation of our sample efficient off-policy learning algorithm known as Policy Gradient with VRER.
arXiv Detail & Related papers (2021-10-17T19:28:45Z) - Large Batch Experience Replay [22.473676537463607]
We introduce new theoretical foundations of Prioritized Experience Replay.
LaBER is an easy-to-code and efficient method for sampling the replay buffer.
arXiv Detail & Related papers (2021-10-04T15:53:13Z) - Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy
Reinforcement Learning [17.3794999533024]
We show that deep RL appears to struggle in the presence of extraneous data.
Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large.
We re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation.
arXiv Detail & Related papers (2021-02-22T19:29:18Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.