TRGP: Trust Region Gradient Projection for Continual Learning
- URL: http://arxiv.org/abs/2202.02931v1
- Date: Mon, 7 Feb 2022 04:21:54 GMT
- Title: TRGP: Trust Region Gradient Projection for Continual Learning
- Authors: Sen Lin, Li Yang, Deliang Fan, Junshan Zhang
- Abstract summary: Catastrophic forgetting is one of the major challenges in continual learning.
We propose Trust Region Gradient Projection to facilitate the forward knowledge transfer.
Our approach achieves significant improvement over related state-of-the-art methods.
- Score: 39.99577526417276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Catastrophic forgetting is one of the major challenges in continual learning.
To address this issue, some existing methods put restrictive constraints on the
optimization space of the new task for minimizing the interference to old
tasks. However, this may lead to unsatisfactory performance for the new task,
especially when the new task is strongly correlated with old tasks. To tackle
this challenge, we propose Trust Region Gradient Projection (TRGP) for
continual learning to facilitate the forward knowledge transfer based on an
efficient characterization of task correlation. Particularly, we introduce a
notion of `trust region' to select the most related old tasks for the new task
in a layer-wise and single-shot manner, using the norm of gradient projection
onto the subspace spanned by task inputs. Then, a scaled weight projection is
proposed to cleverly reuse the frozen weights of the selected old tasks in the
trust region through a layer-wise scaling matrix. By jointly optimizing the
scaling matrices and the model, where the model is updated along the directions
orthogonal to the subspaces of old tasks, TRGP can effectively prompt knowledge
transfer without forgetting. Extensive experiments show that our approach
achieves significant improvement over related state-of-the-art methods.
Related papers
- Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class
Incremental Learning [64.14254712331116]
Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past.
We propose a novel framework of fine-grained knowledge selection and restoration.
arXiv Detail & Related papers (2023-12-20T02:34:11Z) - Dense Network Expansion for Class Incremental Learning [61.00081795200547]
State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task.
A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity.
It outperforms the previous SOTA methods by a margin of 4% in terms of accuracy, with similar or even smaller model scale.
arXiv Detail & Related papers (2023-03-22T16:42:26Z) - Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones.
We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting.
We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Beyond Not-Forgetting: Continual Learning with Backward Knowledge
Transfer [39.99577526417276]
In continual learning (CL) an agent can improve the learning performance of both a new task and old' tasks.
Most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks.
We propose a new CL method with Backward knowlEdge tRansfer (CUBER) for a fixed capacity neural network without data replay.
arXiv Detail & Related papers (2022-11-01T23:55:51Z) - Contextual Squeeze-and-Excitation for Efficient Few-Shot Image
Classification [57.36281142038042]
We present a new adaptive block called Contextual Squeeze-and-Excitation (CaSE) that adjusts a pretrained neural network on a new task to significantly improve performance.
We also present a new training protocol based on Coordinate-Descent called UpperCaSE that exploits meta-trained CaSE blocks and fine-tuning routines for efficient adaptation.
arXiv Detail & Related papers (2022-06-20T15:25:08Z) - Natural continual learning: success is a journey, not (just) a
destination [9.462808515258464]
Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent.
Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs.
The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
arXiv Detail & Related papers (2021-06-15T12:24:53Z) - Layerwise Optimization by Gradient Decomposition for Continual Learning [78.58714373218118]
Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains.
When learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting"
arXiv Detail & Related papers (2021-05-17T01:15:57Z) - Multi-Domain Multi-Task Rehearsal for Lifelong Learning [16.02037222114105]
We propose Multi-Domain Multi-Task (MDMT) rehearsal to train the old tasks and new task parallelly and equally to break the isolation among tasks.
Specifically, a two-level angular margin loss is proposed to encourage the intra-class/task compactness and inter-class/task discrepancy.
In addition, to further address domain shift of the old tasks, we propose an optional episodic distillation loss on the memory to anchor the knowledge for each old task.
arXiv Detail & Related papers (2020-12-14T03:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.