Related papers: A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control

A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control

URL: http://arxiv.org/abs/2507.02712v1
Date: Thu, 03 Jul 2025 15:26:48 GMT
Title: A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control
Authors: Zilin Kang, Chenyuan Hu, Yu Luo, Zhecheng Yuan, Ruijie Zheng, Huazhe Xu,
Abstract summary: We propose Forget and Grow (FoG), a new deep RL algorithm with two mechanisms introduced.<n>First, Experience Replay Decay (ER Decay) "forgetting early experience", which balances memory by gradually reducing the influence of early experiences.<n>Second, Network Expansion, "growing neural capacity", which enhances agents' capability to exploit the patterns of existing data.
Score: 24.96744955563452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning for continuous control has recently achieved impressive progress. However, existing methods often suffer from primacy bias, a tendency to overfit early experiences stored in the replay buffer, which limits an RL agent's sample efficiency and generalizability. In contrast, humans are less susceptible to such bias, partly due to infantile amnesia, where the formation of new neurons disrupts early memory traces, leading to the forgetting of initial experiences. Inspired by this dual processes of forgetting and growing in neuroscience, in this paper, we propose Forget and Grow (FoG), a new deep RL algorithm with two mechanisms introduced. First, Experience Replay Decay (ER Decay) "forgetting early experience", which balances memory by gradually reducing the influence of early experiences. Second, Network Expansion, "growing neural capacity", which enhances agents' capability to exploit the patterns of existing data by dynamically adding new parameters during training. Empirical results on four major continuous control benchmarks with more than 40 tasks demonstrate the superior performance of FoG against SoTA existing deep RL algorithms, including BRO, SimBa, and TD-MPC2.

Related papers

Learning Human Cognitive Appraisal Through Reinforcement Memory Unit [63.83306892013521]
We propose a memory-enhancing mechanism for recurrent neural networks that exploits the effect of human cognitive appraisal in sequential assessment tasks. We conceptualize the memory-enhancing mechanism as Reinforcement Memory Unit (RMU) that contains an appraisal state together with two positive and negative reinforcement memories.
arXiv Detail & Related papers (2022-08-06T08:56:55Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
The Primacy Bias in Deep Reinforcement Learning [10.691354079742016]
This work identifies a common flaw of deep reinforcement learning (RL) algorithms. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences. We propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent.
arXiv Detail & Related papers (2022-05-16T16:48:36Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
Deep Reinforcement Learning with Quantum-inspired Experience Replay [6.833294755109369]
A novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. The proposed deep reinforcement learning with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition) The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms such as DRL-PER and DCRL on most of these games with improved training efficiency.
arXiv Detail & Related papers (2021-01-06T13:52:04Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods. We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z)
Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning [67.34810824996887]
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
arXiv Detail & Related papers (2020-06-10T13:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.