Related papers: Revisiting Prioritized Experience Replay: A Value Perspective

Revisiting Prioritized Experience Replay: A Value Perspective

URL: http://arxiv.org/abs/2102.03261v1
Date: Fri, 5 Feb 2021 16:09:07 GMT
Title: Revisiting Prioritized Experience Replay: A Value Perspective
Authors: Ang A. Li, Zongqing Lu, Chenglin Miao
Abstract summary: We argue that experience replay enables off-policy reinforcement learning agents to utilize past experiences to maximize the cumulative reward. Our framework links two important quantities in RL: $|textTD|$ and value of experience. We empirically show that the bounds hold in practice, and experience replay using the upper bound as priority improves maximum-entropy RL in Atari games.
Score: 21.958500332929898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Experience replay enables off-policy reinforcement learning (RL) agents to utilize past experiences to maximize the cumulative reward. Prioritized experience replay that weighs experiences by the magnitude of their temporal-difference error ($|\text{TD}|$) significantly improves the learning efficiency. But how $|\text{TD}|$ is related to the importance of experience is not well understood. We address this problem from an economic perspective, by linking $|\text{TD}|$ to value of experience, which is defined as the value added to the cumulative reward by accessing the experience. We theoretically show the value metrics of experience are upper-bounded by $|\text{TD}|$ for Q-learning. Furthermore, we successfully extend our theoretical framework to maximum-entropy RL by deriving the lower and upper bounds of these value metrics for soft Q-learning, which turn out to be the product of $|\text{TD}|$ and "on-policyness" of the experiences. Our framework links two important quantities in RL: $|\text{TD}|$ and value of experience. We empirically show that the bounds hold in practice, and experience replay using the upper bound as priority improves maximum-entropy RL in Atari games.

Related papers

Variance Reduction via Resampling and Experience Replay [6.66746639974303]
We present a theoretical framework that models experience replay using resampled $U$- and $V$-statistics. We apply this framework to policy evaluation tasks using the Least-Squares Temporal Difference (LSTD) algorithm and a Partial Differential Equation (PDE)-based model-free algorithm. We extend the framework to kernel ridge regression, showing that the experience replay-based method reduces the computational cost from the traditional $O(n3)$ in time while simultaneously reducing variance.
arXiv Detail & Related papers (2025-02-01T18:46:08Z)
Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout [15.856188608650228]
We present PI+ToD as a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout. We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
arXiv Detail & Related papers (2023-01-26T15:13:04Z)
Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning. We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z)
Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay. We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z)
Improving Experience Replay with Successor Representation [0.0]
Prioritized experience replay is a reinforcement learning technique shown to speed up learning. Recent work in neuroscience suggests that, in biological organisms, replay is prioritized by both gain and need.
arXiv Detail & Related papers (2021-11-29T05:25:54Z)
Munchausen Reinforcement Learning [50.396037940989146]
bootstrapping is a core mechanism in Reinforcement Learning (RL) We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games. We provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.
arXiv Detail & Related papers (2020-07-28T18:30:23Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods. We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z)
Double Prioritized State Recycled Experience Replay [3.42658286826597]
We develop a method called double-prioritized state-recycled (DPSR) experience replay. We used this method in Deep Q-Networks (DQN), and achieved a state-of-the-art result.
arXiv Detail & Related papers (2020-07-08T08:36:41Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.