Temporal Difference Learning with Experience Replay
- URL: http://arxiv.org/abs/2306.09746v1
- Date: Fri, 16 Jun 2023 10:25:43 GMT
- Title: Temporal Difference Learning with Experience Replay
- Authors: Han-Dong Lim, Donghwan Lee
- Abstract summary: Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL)
We present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay.
- Score: 3.5823366350053325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal-difference (TD) learning is widely regarded as one of the most
popular algorithms in reinforcement learning (RL). Despite its widespread use,
it has only been recently that researchers have begun to actively study its
finite time behavior, including the finite time bound on mean squared error and
sample complexity. On the empirical side, experience replay has been a key
ingredient in the success of deep RL algorithms, but its theoretical effects on
RL have yet to be fully understood. In this paper, we present a simple
decomposition of the Markovian noise terms and provide finite-time error bounds
for TD-learning with experience replay. Specifically, under the Markovian
observation model, we demonstrate that for both the averaged iterate and final
iterate cases, the error term induced by a constant step-size can be
effectively controlled by the size of the replay buffer and the mini-batch
sampled from the experience replay buffer.
Related papers
- A Tighter Convergence Proof of Reverse Experience Replay [16.645967034009225]
In reinforcement learning, Reverse Experience Replay (RER) is a recently proposed algorithm that attains better sample complexity than the classic experience replay method.
RER requires the learning algorithm to update the parameters through consecutive state-action-rewards in reverse order.
We show theoretically that RER converges with a larger learning rate and a longer sequence.
arXiv Detail & Related papers (2024-08-30T04:11:35Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Adiabatic replay for continual learning [138.7878582237908]
generative replay spends an increasing amount of time just re-learning what is already known.
We propose a replay-based CL strategy that we term adiabatic replay (AR)
We verify experimentally that AR is superior to state-of-the-art deep generative replay using VAEs.
arXiv Detail & Related papers (2023-03-23T10:18:06Z) - Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error.
We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER.
An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z) - Look Back When Surprised: Stabilizing Reverse Experience Replay for
Neural Approximation [7.6146285961466]
We consider the recently developed and theoretically rigorous reverse experience replay (RER)
We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
arXiv Detail & Related papers (2022-06-07T10:42:02Z) - Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay.
We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z) - Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy
Reinforcement Learning [17.3794999533024]
We show that deep RL appears to struggle in the presence of extraneous data.
Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large.
We re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation.
arXiv Detail & Related papers (2021-02-22T19:29:18Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.