Transient Non-Stationarity and Generalisation in Deep Reinforcement
Learning
- URL: http://arxiv.org/abs/2006.05826v4
- Date: Wed, 22 Sep 2021 08:03:34 GMT
- Title: Transient Non-Stationarity and Generalisation in Deep Reinforcement
Learning
- Authors: Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer,
Shimon Whiteson
- Abstract summary: Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
- Score: 67.34810824996887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-stationarity can arise in Reinforcement Learning (RL) even in stationary
environments. For example, most RL algorithms collect new data throughout
training, using a non-stationary behaviour policy. Due to the transience of
this non-stationarity, it is often not explicitly addressed in deep RL and a
single neural network is continually updated. However, we find evidence that
neural networks exhibit a memory effect where these transient
non-stationarities can permanently impact the latent representation and
adversely affect generalisation performance. Consequently, to improve
generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER
augments standard RL training by repeated knowledge transfer of the current
policy into a freshly initialised network, which thereby experiences less
non-stationarity during training. Experimentally, we show that ITER improves
performance on the challenging generalisation benchmarks ProcGen and Multiroom.
Related papers
- Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Understanding and Preventing Capacity Loss in Reinforcement Learning [28.52122927103544]
We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents.
Capacity loss occurs in a range of RL agents and environments, and is particularly damaging to performance in sparse-reward tasks.
arXiv Detail & Related papers (2022-04-20T15:55:15Z) - Single-Shot Pruning for Offline Reinforcement Learning [47.886329599997474]
Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems.
One way to tackle this problem is to prune neural networks leaving only the necessary parameters.
We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL.
arXiv Detail & Related papers (2021-12-31T18:10:02Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy
Reinforcement Learning [17.3794999533024]
We show that deep RL appears to struggle in the presence of extraneous data.
Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large.
We re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation.
arXiv Detail & Related papers (2021-02-22T19:29:18Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.