Related papers: Temporal Difference Learning with Experience Replay

Temporal Difference Learning with Experience Replay

URL: http://arxiv.org/abs/2306.09746v1
Date: Fri, 16 Jun 2023 10:25:43 GMT
Title: Temporal Difference Learning with Experience Replay
Authors: Han-Dong Lim, Donghwan Lee
Abstract summary: Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL) We present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay.
Score: 3.5823366350053325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the mini-batch sampled from the experience replay buffer.

Related papers

The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning [19.01686700722506]
Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning.<n>We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy.<n>We propose learn to stop (LEAST), a lightweight mechanism that enables strategic early episode termination.
arXiv Detail & Related papers (2025-06-16T16:30:00Z)
Experience Replay with Random Reshuffling [3.6622737533847936]
In supervised learning, it is common to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR) We propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings. We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning.
arXiv Detail & Related papers (2025-03-04T04:37:22Z)
A Tighter Convergence Proof of Reverse Experience Replay [16.645967034009225]
In reinforcement learning, Reverse Experience Replay (RER) is a recently proposed algorithm that attains better sample complexity than the classic experience replay method. RER requires the learning algorithm to update the parameters through consecutive state-action-rewards in reverse order. We show theoretically that RER converges with a larger learning rate and a longer sequence.
arXiv Detail & Related papers (2024-08-30T04:11:35Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. However, it is not expected in practice considering the memory constraint or data privacy issue. As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling. We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays. We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z)
Adiabatic replay for continual learning [138.7878582237908]
generative replay spends an increasing amount of time just re-learning what is already known. We propose a replay-based CL strategy that we term adiabatic replay (AR) We verify experimentally that AR is superior to state-of-the-art deep generative replay using VAEs.
arXiv Detail & Related papers (2023-03-23T10:18:06Z)
Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z)
Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation [7.6146285961466]
We consider the recently developed and theoretically rigorous reverse experience replay (RER) We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
arXiv Detail & Related papers (2022-06-07T10:42:02Z)
Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay. We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z)
An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF) Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness. We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z)
Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning [17.3794999533024]
We show that deep RL appears to struggle in the presence of extraneous data. Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large. We re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation.
arXiv Detail & Related papers (2021-02-22T19:29:18Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods. We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.