Reliability-Adjusted Prioritized Experience Replay
- URL: http://arxiv.org/abs/2506.18482v2
- Date: Thu, 03 Jul 2025 09:55:57 GMT
- Title: Reliability-Adjusted Prioritized Experience Replay
- Authors: Leonard S. Pleiss, Tobias Sutter, Maximilian Schiffer,
- Abstract summary: We propose an extension to Prioritized Experience Replay (PER) by introducing a novel measure of temporal difference error reliability.<n>We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER.
- Score: 5.342556166066767
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.
Related papers
- Experience Replay with Random Reshuffling [3.6622737533847936]
In supervised learning, it is common to shuffle the dataset every epoch and consume data sequentially, which is called random reshuffling (RR)<n>We propose sampling methods that extend RR to experience replay, both in uniform and prioritized settings.<n>We evaluate our sampling methods on Atari benchmarks, demonstrating their effectiveness in deep reinforcement learning.
arXiv Detail & Related papers (2025-03-04T04:37:22Z) - Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method [1.600323605807673]
We introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER)<n>RPE-PER prioritises experiences in the buffer based on RPEs.<n>Our method employs a critic network, EMCN, that predicts rewards in addition to the Q-values produced by standard critic networks.
arXiv Detail & Related papers (2025-01-30T02:09:35Z) - Iterative Experience Refinement of Software-Developing Agents [81.09737243969758]
Large language models (LLMs) can leverage past experiences to reduce errors and enhance efficiency.
This paper introduces the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution.
arXiv Detail & Related papers (2024-05-07T11:33:49Z) - Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning.
We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z) - Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay.
We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z) - Improving Experience Replay with Successor Representation [0.0]
Prioritized experience replay is a reinforcement learning technique shown to speed up learning.
Recent work in neuroscience suggests that, in biological organisms, replay is prioritized by both gain and need.
arXiv Detail & Related papers (2021-11-29T05:25:54Z) - Learning to Sample with Local and Global Contexts in Experience Replay
Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition.
We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Double Prioritized State Recycled Experience Replay [3.42658286826597]
We develop a method called double-prioritized state-recycled (DPSR) experience replay.
We used this method in Deep Q-Networks (DQN), and achieved a state-of-the-art result.
arXiv Detail & Related papers (2020-07-08T08:36:41Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.