Double Prioritized State Recycled Experience Replay
- URL: http://arxiv.org/abs/2007.03961v3
- Date: Mon, 21 Sep 2020 12:15:24 GMT
- Title: Double Prioritized State Recycled Experience Replay
- Authors: Fanchen Bu, Dong Eui Chang
- Abstract summary: We develop a method called double-prioritized state-recycled (DPSR) experience replay.
We used this method in Deep Q-Networks (DQN), and achieved a state-of-the-art result.
- Score: 3.42658286826597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Experience replay enables online reinforcement learning agents to store and
reuse the previous experiences of interacting with the environment. In the
original method, the experiences are sampled and replayed uniformly at random.
A prior work called prioritized experience replay was developed where
experiences are prioritized, so as to replay experiences seeming to be more
important more frequently. In this paper, we develop a method called
double-prioritized state-recycled (DPSR) experience replay, prioritizing the
experiences in both training stage and storing stage, as well as replacing the
experiences in the memory with state recycling to make the best of experiences
that seem to have low priorities temporarily. We used this method in Deep
Q-Networks (DQN), and achieved a state-of-the-art result, outperforming the
original method and prioritized experience replay on many Atari games.
Related papers
- Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning.
We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z) - Improving Experience Replay with Successor Representation [0.0]
Prioritized experience replay is a reinforcement learning technique shown to speed up learning.
Recent work in neuroscience suggests that, in biological organisms, replay is prioritized by both gain and need.
arXiv Detail & Related papers (2021-11-29T05:25:54Z) - Reducing Representation Drift in Online Continual Learning [87.71558506591937]
We study the online continual learning paradigm, where agents must learn from a changing distribution with constrained memory and compute.
In this work we instead focus on the change in representations of previously observed data due to the introduction of previously unobserved class samples in the incoming data stream.
arXiv Detail & Related papers (2021-04-11T15:19:30Z) - Revisiting Prioritized Experience Replay: A Value Perspective [21.958500332929898]
We argue that experience replay enables off-policy reinforcement learning agents to utilize past experiences to maximize the cumulative reward.
Our framework links two important quantities in RL: $|textTD|$ and value of experience.
We empirically show that the bounds hold in practice, and experience replay using the upper bound as priority improves maximum-entropy RL in Atari games.
arXiv Detail & Related papers (2021-02-05T16:09:07Z) - Lucid Dreaming for Experience Replay: Refreshing Past States with the
Current Policy [48.8675653453076]
We introduce Lucid Dreaming for Experience Replay (LiDER), a framework that allows replay experiences to be refreshed by leveraging the agent's current policy.
LiDER consistently improves performance over the baseline in six Atari 2600 games.
arXiv Detail & Related papers (2020-09-29T02:54:11Z) - Learning to Sample with Local and Global Contexts in Experience Replay
Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition.
We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Bootstrapping a DQN Replay Memory with Synthetic Experiences [0.0]
We present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the learner.
The Interpolated Experience Replay is evaluated on the FrozenLake environment and we show that it can support the agent to learn faster and even better than the classic version.
arXiv Detail & Related papers (2020-02-04T15:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.