Parseval Regularization for Continual Reinforcement Learning
- URL: http://arxiv.org/abs/2412.07224v1
- Date: Tue, 10 Dec 2024 06:19:21 GMT
- Title: Parseval Regularization for Continual Reinforcement Learning
- Authors: Wesley Chung, Lynn Cherif, David Meger, Doina Precup,
- Abstract summary: Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks.
We propose to use Parseval regularization to preserve useful optimization properties and improve training in a continual reinforcement learning setting.
- Score: 46.92116826808205
- License:
- Abstract: Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinforcement learning setting. We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks. We conduct comprehensive ablations to identify the source of its benefits and investigate the effect of certain metrics associated to network trainability including weight matrix rank, weight norms and policy entropy.
Related papers
- Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective [125.00228936051657]
We introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features.
By fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks.
arXiv Detail & Related papers (2024-07-24T09:30:04Z) - Learning Continually by Spectral Regularization [45.55508032009977]
Continual learning algorithms seek to mitigate loss of plasticity by sustaining good performance while maintaining network trainability.
We develop a new technique for improving continual learning inspired by the observation that the singular values of the neural network parameters at initialization are an important factor for trainability during early phases of learning.
We present an experimental analysis that shows how the proposed spectral regularizer can sustain trainability and performance across a range of model architectures in continual supervised and reinforcement learning settings.
arXiv Detail & Related papers (2024-06-10T21:34:43Z) - Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales [13.818149654692863]
Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance.
In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss.
arXiv Detail & Related papers (2024-05-27T19:28:33Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Understanding and Preventing Capacity Loss in Reinforcement Learning [28.52122927103544]
We identify a mechanism by which non-stationary prediction targets can prevent learning progress in deep RL agents.
Capacity loss occurs in a range of RL agents and environments, and is particularly damaging to performance in sparse-reward tasks.
arXiv Detail & Related papers (2022-04-20T15:55:15Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Data-driven Regularization via Racecar Training for Generalizing Neural
Networks [28.08782668165276]
We propose a novel training approach for improving the generalization in neural networks.
We show how our formulation is easy to realize in practical network architectures via a reverse pass.
Networks trained with our approach show more balanced mutual information between input and output throughout all layers, yield improved explainability and, exhibit improved performance for a variety of tasks and task transfers.
arXiv Detail & Related papers (2020-06-30T18:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.