Related papers: Improving Experience Replay with Successor Representation

Improving Experience Replay with Successor Representation

URL: http://arxiv.org/abs/2111.14331v1
Date: Mon, 29 Nov 2021 05:25:54 GMT
Title: Improving Experience Replay with Successor Representation
Authors: Yizhi Yuan, Marcelo Mattar
Abstract summary: Prioritized experience replay is a reinforcement learning technique shown to speed up learning. Recent work in neuroscience suggests that, in biological organisms, replay is prioritized by both gain and need.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prioritized experience replay is a reinforcement learning technique shown to speed up learning by allowing agents to replay useful past experiences more frequently. This usefulness is quantified as the expected gain from replaying the experience, and is often approximated as the prediction error (TD-error) observed during the corresponding experience. However, prediction error is only one possible prioritization metric. Recent work in neuroscience suggests that, in biological organisms, replay is prioritized by both gain and need. The need term measures the expected relevance of each experience with respect to the current situation, and more importantly, this term is not currently considered in algorithms such as deep Q-network (DQN). Thus, in this paper we present a new approach for prioritizing experiences for replay that considers both gain and need. We test our approach by considering the need term, quantified as the Successor Representation, into the sampling process of different reinforcement learning algorithms. Our proposed algorithms show a significant increase in performance in benchmarks including the Dyna-Q maze and a selection of Atari games.

Related papers

Reliability-Adjusted Prioritized Experience Replay [5.342556166066767]
We propose an extension to Prioritized Experience Replay (PER) by introducing a novel measure of temporal difference error reliability.<n>We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER.
arXiv Detail & Related papers (2025-06-23T10:35:36Z)
Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method [1.600323605807673]
We introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER) RPE-PER prioritises experiences in the buffer based on RPEs. Our method employs a critic network, EMCN, that predicts rewards in addition to the Q-values produced by standard critic networks.
arXiv Detail & Related papers (2025-01-30T02:09:35Z)
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice. HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics. Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z)
Detachedly Learn a Classifier for Class-Incremental Learning [11.865788374587734]
We present an analysis that the failure of vanilla experience replay (ER) comes from unnecessary re-learning of previous tasks and incompetence to distinguish current task from the previous ones. We propose a novel replay strategy task-aware experience replay. Experimental results show our method outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2023-02-23T01:35:44Z)
Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation [7.6146285961466]
We consider the recently developed and theoretically rigorous reverse experience replay (RER) We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
arXiv Detail & Related papers (2022-06-07T10:42:02Z)
Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning. We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z)
Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay. We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z)
Reducing Representation Drift in Online Continual Learning [87.71558506591937]
We study the online continual learning paradigm, where agents must learn from a changing distribution with constrained memory and compute. In this work we instead focus on the change in representations of previously observed data due to the introduction of previously unobserved class samples in the incoming data stream.
arXiv Detail & Related papers (2021-04-11T15:19:30Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods. We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.