Revisiting Prioritized Experience Replay: A Value Perspective
- URL: http://arxiv.org/abs/2102.03261v1
- Date: Fri, 5 Feb 2021 16:09:07 GMT
- Title: Revisiting Prioritized Experience Replay: A Value Perspective
- Authors: Ang A. Li, Zongqing Lu, Chenglin Miao
- Abstract summary: We argue that experience replay enables off-policy reinforcement learning agents to utilize past experiences to maximize the cumulative reward.
Our framework links two important quantities in RL: $|textTD|$ and value of experience.
We empirically show that the bounds hold in practice, and experience replay using the upper bound as priority improves maximum-entropy RL in Atari games.
- Score: 21.958500332929898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Experience replay enables off-policy reinforcement learning (RL) agents to
utilize past experiences to maximize the cumulative reward. Prioritized
experience replay that weighs experiences by the magnitude of their
temporal-difference error ($|\text{TD}|$) significantly improves the learning
efficiency. But how $|\text{TD}|$ is related to the importance of experience is
not well understood. We address this problem from an economic perspective, by
linking $|\text{TD}|$ to value of experience, which is defined as the value
added to the cumulative reward by accessing the experience. We theoretically
show the value metrics of experience are upper-bounded by $|\text{TD}|$ for
Q-learning. Furthermore, we successfully extend our theoretical framework to
maximum-entropy RL by deriving the lower and upper bounds of these value
metrics for soft Q-learning, which turn out to be the product of $|\text{TD}|$
and "on-policyness" of the experiences. Our framework links two important
quantities in RL: $|\text{TD}|$ and value of experience. We empirically show
that the bounds hold in practice, and experience replay using the upper bound
as priority improves maximum-entropy RL in Atari games.
Related papers
- Which Experiences Are Influential for Your Agent? Policy Iteration with
Turn-over Dropout [15.856188608650228]
We present PI+ToD as a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout.
We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
arXiv Detail & Related papers (2023-01-26T15:13:04Z) - Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning.
We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z) - Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay.
We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z) - Improving Experience Replay with Successor Representation [0.0]
Prioritized experience replay is a reinforcement learning technique shown to speed up learning.
Recent work in neuroscience suggests that, in biological organisms, replay is prioritized by both gain and need.
arXiv Detail & Related papers (2021-11-29T05:25:54Z) - Munchausen Reinforcement Learning [50.396037940989146]
bootstrapping is a core mechanism in Reinforcement Learning (RL)
We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games.
We provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.
arXiv Detail & Related papers (2020-07-28T18:30:23Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Double Prioritized State Recycled Experience Replay [3.42658286826597]
We develop a method called double-prioritized state-recycled (DPSR) experience replay.
We used this method in Deep Q-Networks (DQN), and achieved a state-of-the-art result.
arXiv Detail & Related papers (2020-07-08T08:36:41Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.