Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy
Reinforcement Learning
- URL: http://arxiv.org/abs/2102.11319v1
- Date: Mon, 22 Feb 2021 19:29:18 GMT
- Title: Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy
Reinforcement Learning
- Authors: Brett Daley, Cameron Hickert, Christopher Amato
- Abstract summary: We show that deep RL appears to struggle in the presence of extraneous data.
Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large.
We re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation.
- Score: 17.3794999533024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (RL) methods rely on experience replay to
approximate the minibatched supervised learning setting; however, unlike
supervised learning where access to lots of training data is crucial to
generalization, replay-based deep RL appears to struggle in the presence of
extraneous data. Recent works have shown that the performance of Deep Q-Network
(DQN) degrades when its replay memory becomes too large.
This suggests that outdated experiences somehow impact the performance of
deep RL, which should not be the case for off-policy methods like DQN.
Consequently, we re-examine the motivation for sampling uniformly over a replay
memory, and find that it may be flawed when using function approximation. We
show that -- despite conventional wisdom -- sampling from the uniform
distribution does not yield uncorrelated training samples and therefore biases
gradients during training. Our theory prescribes a special non-uniform
distribution to cancel this effect, and we propose a stratified sampling scheme
to efficiently implement it.
Related papers
- Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Temporal Difference Learning with Experience Replay [3.5823366350053325]
Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL)
We present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay.
arXiv Detail & Related papers (2023-06-16T10:25:43Z) - PCR: Proxy-based Contrastive Replay for Online Class-Incremental
Continual Learning [16.67238259139417]
Existing replay-based methods effectively alleviate this issue by saving and replaying part of old data in a proxy-based or contrastive-based replay manner.
We propose a novel replay-based method called proxy-based contrastive replay (PCR)
arXiv Detail & Related papers (2023-04-10T06:35:19Z) - A simple but strong baseline for online continual learning: Repeated
Augmented Rehearsal [13.075018350152074]
Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data.
Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting.
We provide theoretical insights on the inherent memory overfitting risk from the viewpoint of biased and dynamic empirical risk minimization.
arXiv Detail & Related papers (2022-09-28T08:43:35Z) - Look Back When Surprised: Stabilizing Reverse Experience Replay for
Neural Approximation [7.6146285961466]
We consider the recently developed and theoretically rigorous reverse experience replay (RER)
We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
arXiv Detail & Related papers (2022-06-07T10:42:02Z) - Replay For Safety [51.11953997546418]
In experience replay, past transitions are stored in a memory buffer and re-used during learning.
We show that using an appropriate biased sampling scheme can allow us to achieve a emphsafe policy.
arXiv Detail & Related papers (2021-12-08T11:10:57Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Transient Non-Stationarity and Generalisation in Deep Reinforcement
Learning [67.34810824996887]
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
arXiv Detail & Related papers (2020-06-10T13:26:31Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.