Look Back When Surprised: Stabilizing Reverse Experience Replay for
Neural Approximation
- URL: http://arxiv.org/abs/2206.03171v1
- Date: Tue, 7 Jun 2022 10:42:02 GMT
- Title: Look Back When Surprised: Stabilizing Reverse Experience Replay for
Neural Approximation
- Authors: Ramnath Kumar, Dheeraj Nagaraj
- Abstract summary: We consider the recently developed and theoretically rigorous reverse experience replay (RER)
We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
- Score: 7.6146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Experience replay methods, which are an essential part of reinforcement
learning(RL) algorithms, are designed to mitigate spurious correlations and
biases while learning from temporally dependent data. Roughly speaking, these
methods allow us to draw batched data from a large buffer such that these
temporal correlations do not hinder the performance of descent algorithms. In
this experimental work, we consider the recently developed and theoretically
rigorous reverse experience replay (RER), which has been shown to remove such
spurious biases in simplified theoretical settings. We combine RER with
optimistic experience replay (OER) to obtain RER++, which is stable under
neural function approximation. We show via experiments that this has a better
performance than techniques like prioritized experience replay (PER) on various
tasks, with a significantly smaller computational complexity. It is well known
in the RL literature that choosing examples greedily with the largest TD error
(as in OER) or forming mini-batches with consecutive data points (as in RER)
leads to poor performance. However, our method, which combines these
techniques, works very well.
Related papers
- A Tighter Convergence Proof of Reverse Experience Replay [16.645967034009225]
In reinforcement learning, Reverse Experience Replay (RER) is a recently proposed algorithm that attains better sample complexity than the classic experience replay method.
RER requires the learning algorithm to update the parameters through consecutive state-action-rewards in reverse order.
We show theoretically that RER converges with a larger learning rate and a longer sequence.
arXiv Detail & Related papers (2024-08-30T04:11:35Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Replay across Experiments: A Natural Extension of Off-Policy RL [18.545939667810565]
We present an effective yet simple framework to extend the use of replays across multiple experiments.
At its core, Replay Across Experiments (RaE) involves reusing experience from previous experiments to improve exploration and bootstrap learning.
We empirically show benefits across a number of RL algorithms and challenging control domains spanning both locomotion and manipulation.
arXiv Detail & Related papers (2023-11-27T15:57:11Z) - Temporal Difference Learning with Experience Replay [3.5823366350053325]
Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL)
We present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay.
arXiv Detail & Related papers (2023-06-16T10:25:43Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay.
We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z) - Learning Expected Emphatic Traces for Deep RL [32.984880782688535]
Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods.
We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.
arXiv Detail & Related papers (2021-07-12T13:14:03Z) - Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy
Reinforcement Learning [17.3794999533024]
We show that deep RL appears to struggle in the presence of extraneous data.
Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large.
We re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation.
arXiv Detail & Related papers (2021-02-22T19:29:18Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Accelerated Convergence for Counterfactual Learning to Rank [65.63997193915257]
We show that convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights.
We propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods.
We prove that CounterSample converges faster and complement our theoretical findings with empirical results.
arXiv Detail & Related papers (2020-05-21T12:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.