Related papers: Learning Continually by Spectral Regularization

Learning Continually by Spectral Regularization

URL: http://arxiv.org/abs/2406.06811v2
Date: Sun, 27 Oct 2024 23:45:38 GMT
Title: Learning Continually by Spectral Regularization
Authors: Alex Lewandowski, Michał Bortkiewicz, Saurabh Kumar, András György, Dale Schuurmans, Mateusz Ostaszewski, Marlos C. Machado,
Abstract summary: Continual learning algorithms seek to mitigate loss of plasticity by sustaining good performance while maintaining network trainability. We develop a new technique for improving continual learning inspired by the observation that the singular values of the neural network parameters at initialization are an important factor for trainability during early phases of learning. We present an experimental analysis that shows how the proposed spectral regularizer can sustain trainability and performance across a range of model architectures in continual supervised and reinforcement learning settings.
Score: 45.55508032009977
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Loss of plasticity is a phenomenon where neural networks can become more difficult to train over the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good performance while maintaining network trainability. We develop a new technique for improving continual learning inspired by the observation that the singular values of the neural network parameters at initialization are an important factor for trainability during early phases of learning. From this perspective, we derive a new spectral regularizer for continual learning that better sustains these beneficial initialization properties throughout training. In particular, the regularizer keeps the maximum singular value of each layer close to one. Spectral regularization directly ensures that gradient diversity is maintained throughout training, which promotes continual trainability, while minimally interfering with performance in a single task. We present an experimental analysis that shows how the proposed spectral regularizer can sustain trainability and performance across a range of model architectures in continual supervised and reinforcement learning settings. Spectral regularization is less sensitive to hyperparameters while demonstrating better training in individual tasks, sustaining trainability as new tasks arrive, and achieving better generalization performance.

Related papers

Trust Region Continual Learning as an Implicit Meta-Learner [3.705371747297478]
We study a hybrid perspective: emphtrust region continual learning that combines generative replay with a Fisher-metric trust region constraint.<n>We show that, under local approximations, the resulting update admits a MAML-style interpretation with a single implicit inner step.<n>This yields an emergent meta-learning property in continual learning.
arXiv Detail & Related papers (2026-02-02T18:19:16Z)
Spectral Imbalance Causes Forgetting in Low-Rank Continual Adaptation [58.3773038915023]
Continual learning aims to adapt pre-trained models to sequential tasks without forgetting previously acquired knowledge.<n>Most existing approaches treat continual learning as avoiding interference with past updates, rather than considering what properties make the current task-specific update naturally preserve previously acquired knowledge.<n>We address this problem using a projected first-order method compatible with standard deep-dots used in vision-language models.
arXiv Detail & Related papers (2026-01-31T13:27:02Z)
Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning [51.07663354001582]
Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task.<n>We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches.<n>We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods.
arXiv Detail & Related papers (2025-12-01T15:56:00Z)
Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning [14.196969540084929]
We show that deep neural networks suffer from loss of plasticity in deep continual learning.<n>We introduce the notion of $tau$-trainability and show that current plasticity preserving algorithms can be unified under this framework.<n> Experiments on continual supervised and reinforcement learning tasks confirm that combining these two regularizers effectively preserves plasticity.
arXiv Detail & Related papers (2025-09-26T13:28:53Z)
Parseval Regularization for Continual Reinforcement Learning [46.92116826808205]
Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks. We propose to use Parseval regularization to preserve useful optimization properties and improve training in a continual reinforcement learning setting.
arXiv Detail & Related papers (2024-12-10T06:19:21Z)
Point-Calibrated Spectral Neural Operators [54.13671100638092]
We introduce Point-Calibrated Spectral Transform, which learns operator mappings by approximating functions with the point-level adaptive spectral basis. Point-Calibrated Spectral Neural Operators learn operator mappings by approximating functions with the point-level adaptive spectral basis.
arXiv Detail & Related papers (2024-10-15T08:19:39Z)
Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training [0.0]
We develop a dynamic learning rate algorithm that integrates exponential decay and advanced anti-overfitting strategies. We prove that the superlevel sets of the loss function, as influenced by our adaptive learning rate, are always connected.
arXiv Detail & Related papers (2024-09-25T09:27:17Z)
Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge. We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
Continual Learning via Sequential Function-Space Variational Inference [65.96686740015902]
We propose an objective derived by formulating continual learning as sequential function-space variational inference. Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions. We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods.
arXiv Detail & Related papers (2023-12-28T18:44:32Z)
On discretisation drift and smoothness regularisation in neural network training [0.0]
We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. We derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regulariser
arXiv Detail & Related papers (2023-10-21T15:21:36Z)
Continual Learning with Pretrained Backbones by Tuning in the Input Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective [22.625456135981292]
We show we can recover the performance of developments not by changing the objective, but by regularising the value-function estimator. We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning.
arXiv Detail & Related papers (2021-05-11T17:59:46Z)
Optimization and Generalization of Regularization-Based Continual Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task. Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.