Related papers: Layerwise Optimization by Gradient Decomposition for Continual Learning

Layerwise Optimization by Gradient Decomposition for Continual Learning

URL: http://arxiv.org/abs/2105.07561v1
Date: Mon, 17 May 2021 01:15:57 GMT
Title: Layerwise Optimization by Gradient Decomposition for Continual Learning
Authors: Shixiang Tang, Dapeng Chen, Jinguo Zhu, Shijie Yu and Wanli Ouyang
Abstract summary: Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. When learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting"
Score: 78.58714373218118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. However, when learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting". To achieve the consistencies between the old tasks and the new task, one effective solution is to modify the gradient for update. Previous methods enforce independent gradient constraints for different tasks, while we consider these gradients contain complex information, and propose to leverage inter-task information by gradient decomposition. In particular, the gradient of an old task is decomposed into a part shared by all old tasks and a part specific to that task. The gradient for update should be close to the gradient of the new task, consistent with the gradients shared by all old tasks, and orthogonal to the space spanned by the gradients specific to the old tasks. In this way, our approach encourages common knowledge consolidation without impairing the task-specific knowledge. Furthermore, the optimization is performed for the gradients of each layer separately rather than the concatenation of all gradients as in previous works. This effectively avoids the influence of the magnitude variation of the gradients in different layers. Extensive experiments validate the effectiveness of both gradient-decomposed optimization and layer-wise updates. Our proposed method achieves state-of-the-art results on various benchmarks of continual learning.

Related papers

How to guess a gradient [68.98681202222664]
We show that gradients are more structured than previously thought. Exploiting this structure can significantly improve gradient-free optimization schemes. We highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
arXiv Detail & Related papers (2023-12-07T21:40:44Z)
Class Gradient Projection For Continual Learning [99.105266615448]
Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL) We propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks.
arXiv Detail & Related papers (2023-11-25T02:45:56Z)
Gradient Coordination for Quantifying and Maximizing Knowledge Transference in Multi-Task Learning [11.998475119120531]
Multi-task learning (MTL) has been widely applied in online advertising and recommender systems. We propose a transference-driven approach CoGrad that adaptively maximizes knowledge transference.
arXiv Detail & Related papers (2023-03-10T10:42:21Z)
Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones. We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting. We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z)
Delving into Effective Gradient Matching for Dataset Condensation [13.75957901381024]
gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets. We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information. An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement.
arXiv Detail & Related papers (2022-07-30T21:31:10Z)
Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous. We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z)
Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients. We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z)
TAG: Task-based Accumulated Gradients for Lifelong learning [21.779858050277475]
We propose a task-aware system that adapts the learning rate based on the relatedness among tasks. We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer.
arXiv Detail & Related papers (2021-05-11T16:10:32Z)
Gradient Projection Memory for Continual Learning [5.43185002439223]
The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
arXiv Detail & Related papers (2021-03-17T16:31:29Z)
Regularizing Meta-Learning via Gradient Dropout [102.29924160341572]
meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize. We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
arXiv Detail & Related papers (2020-04-13T10:47:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.