The Primacy Bias in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2205.07802v1
- Date: Mon, 16 May 2022 16:48:36 GMT
- Title: The Primacy Bias in Deep Reinforcement Learning
- Authors: Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon,
Aaron Courville
- Abstract summary: This work identifies a common flaw of deep reinforcement learning (RL) algorithms.
Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences.
We propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent.
- Score: 10.691354079742016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work identifies a common flaw of deep reinforcement learning (RL)
algorithms: a tendency to rely on early interactions and ignore useful evidence
encountered later. Because of training on progressively growing datasets, deep
RL agents incur a risk of overfitting to earlier experiences, negatively
affecting the rest of the learning process. Inspired by cognitive science, we
refer to this effect as the primacy bias. Through a series of experiments, we
dissect the algorithmic aspects of deep RL that exacerbate this bias. We then
propose a simple yet generally-applicable mechanism that tackles the primacy
bias by periodically resetting a part of the agent. We apply this mechanism to
algorithms in both discrete (Atari 100k) and continuous action (DeepMind
Control Suite) domains, consistently improving their performance.
Related papers
- A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control [24.96744955563452]
We propose Forget and Grow (FoG), a new deep RL algorithm with two mechanisms introduced.<n>First, Experience Replay Decay (ER Decay) "forgetting early experience", which balances memory by gradually reducing the influence of early experiences.<n>Second, Network Expansion, "growing neural capacity", which enhances agents' capability to exploit the patterns of existing data.
arXiv Detail & Related papers (2025-07-03T15:26:48Z) - Dissecting Deep RL with High Update Ratios: Combatting Value Divergence [21.282292112642747]
We show that deep reinforcement learning algorithms can retain their ability to learn without resetting network parameters.
We employ a simple unit-ball normalization that enables learning under large update ratios.
arXiv Detail & Related papers (2024-03-09T19:56:40Z) - Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks.
We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent.
Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z) - Look Back When Surprised: Stabilizing Reverse Experience Replay for
Neural Approximation [7.6146285961466]
We consider the recently developed and theoretically rigorous reverse experience replay (RER)
We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
arXiv Detail & Related papers (2022-06-07T10:42:02Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Subtle Inverse Crimes: Na\"ively training machine learning algorithms
could lead to overly-optimistic results [5.785136336372809]
This work aims to highlight that in some cases, this common practice may lead to biased, overly-optimistic results.
We describe two preprocessing pipelines typical of open-access databases and study their effects on three well-established algorithms.
Our results demonstrate that the CS, DictL and DL algorithms yield systematically biased results when na"ively trained on seemingly-appropriate data.
arXiv Detail & Related papers (2021-09-16T22:00:15Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Transient Non-Stationarity and Generalisation in Deep Reinforcement
Learning [67.34810824996887]
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
arXiv Detail & Related papers (2020-06-10T13:26:31Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.