Related papers: CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning

CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning

URL: http://arxiv.org/abs/2411.15235v2
Date: Fri, 07 Mar 2025 22:46:12 GMT
Title: CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning
Authors: Marco Paul E. Apolinario, Sakshi Choudhary, Kaushik Roy,
Abstract summary: Deep neural networks struggle with catastrophic forgetting when learning tasks sequentially.<n>Recent approaches constrain updates to subspaces using gradient projection.<n>We propose Conceptor-based gradient projection for Deep Continual Learning (CODE-CL)
Score: 6.738409533239947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual learning (CL) - the ability to progressively acquire and integrate new concepts - is essential to intelligent systems to adapt to dynamic environments. However, deep neural networks struggle with catastrophic forgetting (CF) when learning tasks sequentially, as training for new tasks often overwrites previously learned knowledge. To address this, recent approaches constrain updates to orthogonal subspaces using gradient projection, effectively preserving important gradient directions for previous tasks. While effective in reducing forgetting, these approaches inadvertently hinder forward knowledge transfer (FWT), particularly when tasks are highly correlated. In this work, we propose Conceptor-based gradient projection for Deep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a form of regularized reconstruction, to adaptively handle highly correlated tasks. CODE-CL mitigates CF by projecting gradients onto pseudo-orthogonal subspaces of previous task feature spaces while simultaneously promoting FWT. It achieves this by learning a linear combination of shared basis directions, allowing efficient balance between stability and plasticity and transfer of knowledge between overlapping input feature representations. Extensive experiments on continual learning benchmarks validate CODE-CL's efficacy, demonstrating superior performance, reduced forgetting, and improved FWT as compared to state-of-the-art methods.

Related papers

Gradient-free Continual Learning [0.0]
Continual learning (CL) presents a fundamental challenge in training neural networks on sequential tasks without experiencing catastrophic forgetting. Traditionally, the dominant approach in CL has been gradient-based optimization, where updates to the network parameters are performed using gradient descent (SGD) or its variants. In such cases, there is no gradient information available for past data, leading to uncontrolled parameter changes and consequently severe forgetting of previously learned tasks. We explore the hypothesis that gradient-free optimization methods can provide a robust alternative to conventional gradient-based continual learning approaches.
arXiv Detail & Related papers (2025-04-01T22:18:59Z)
Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network. Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks. In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge. We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL) This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z)
Overcoming Domain Drift in Online Continual Learning [24.86094018430407]
Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks. We propose a novel rehearsal strategy, Drift-Reducing Rehearsal (DRR), to anchor the domain of old tasks and reduce the negative transfer effects.
arXiv Detail & Related papers (2024-05-15T06:57:18Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information. We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
Dense Network Expansion for Class Incremental Learning [61.00081795200547]
State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task. A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity. It outperforms the previous SOTA methods by a margin of 4% in terms of accuracy, with similar or even smaller model scale.
arXiv Detail & Related papers (2023-03-22T16:42:26Z)
Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones. We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting. We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z)
Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer [39.99577526417276]
In continual learning (CL) an agent can improve the learning performance of both a new task and old' tasks. Most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks. We propose a new CL method with Backward knowlEdge tRansfer (CUBER) for a fixed capacity neural network without data replay.
arXiv Detail & Related papers (2022-11-01T23:55:51Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Center Loss Regularization for Continual Learning [0.0]
In general, neural networks lack the ability to learn different tasks sequentially. Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks. We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.
arXiv Detail & Related papers (2021-10-21T17:46:44Z)
Natural continual learning: success is a journey, not (just) a destination [9.462808515258464]
Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent. Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs. The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
arXiv Detail & Related papers (2021-06-15T12:24:53Z)
Gradient Projection Memory for Continual Learning [5.43185002439223]
The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
arXiv Detail & Related papers (2021-03-17T16:31:29Z)
Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks. We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI) In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.