Layerwise Optimization by Gradient Decomposition for Continual Learning
- URL: http://arxiv.org/abs/2105.07561v1
- Date: Mon, 17 May 2021 01:15:57 GMT
- Title: Layerwise Optimization by Gradient Decomposition for Continual Learning
- Authors: Shixiang Tang, Dapeng Chen, Jinguo Zhu, Shijie Yu and Wanli Ouyang
- Abstract summary: Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains.
When learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting"
- Score: 78.58714373218118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks achieve state-of-the-art and sometimes super-human
performance across various domains. However, when learning tasks sequentially,
the networks easily forget the knowledge of previous tasks, known as
"catastrophic forgetting". To achieve the consistencies between the old tasks
and the new task, one effective solution is to modify the gradient for update.
Previous methods enforce independent gradient constraints for different tasks,
while we consider these gradients contain complex information, and propose to
leverage inter-task information by gradient decomposition. In particular, the
gradient of an old task is decomposed into a part shared by all old tasks and a
part specific to that task. The gradient for update should be close to the
gradient of the new task, consistent with the gradients shared by all old
tasks, and orthogonal to the space spanned by the gradients specific to the old
tasks. In this way, our approach encourages common knowledge consolidation
without impairing the task-specific knowledge. Furthermore, the optimization is
performed for the gradients of each layer separately rather than the
concatenation of all gradients as in previous works. This effectively avoids
the influence of the magnitude variation of the gradients in different layers.
Extensive experiments validate the effectiveness of both gradient-decomposed
optimization and layer-wise updates. Our proposed method achieves
state-of-the-art results on various benchmarks of continual learning.
Related papers
- How to guess a gradient [68.98681202222664]
We show that gradients are more structured than previously thought.
Exploiting this structure can significantly improve gradient-free optimization schemes.
We highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
arXiv Detail & Related papers (2023-12-07T21:40:44Z) - Class Gradient Projection For Continual Learning [99.105266615448]
Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL)
We propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks.
arXiv Detail & Related papers (2023-11-25T02:45:56Z) - Gradient Coordination for Quantifying and Maximizing Knowledge
Transference in Multi-Task Learning [11.998475119120531]
Multi-task learning (MTL) has been widely applied in online advertising and recommender systems.
We propose a transference-driven approach CoGrad that adaptively maximizes knowledge transference.
arXiv Detail & Related papers (2023-03-10T10:42:21Z) - Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones.
We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting.
We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z) - Delving into Effective Gradient Matching for Dataset Condensation [13.75957901381024]
gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets.
We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information.
An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement.
arXiv Detail & Related papers (2022-07-30T21:31:10Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - TAG: Task-based Accumulated Gradients for Lifelong learning [21.779858050277475]
We propose a task-aware system that adapts the learning rate based on the relatedness among tasks.
We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer.
arXiv Detail & Related papers (2021-05-11T16:10:32Z) - Gradient Projection Memory for Continual Learning [5.43185002439223]
The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems.
We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
arXiv Detail & Related papers (2021-03-17T16:31:29Z) - Regularizing Meta-Learning via Gradient Dropout [102.29924160341572]
meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize.
We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
arXiv Detail & Related papers (2020-04-13T10:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.