TAG: Task-based Accumulated Gradients for Lifelong learning
- URL: http://arxiv.org/abs/2105.05155v1
- Date: Tue, 11 May 2021 16:10:32 GMT
- Title: TAG: Task-based Accumulated Gradients for Lifelong learning
- Authors: Pranshu Malviya, Balaraman Ravindran, Sarath Chandar
- Abstract summary: We propose a task-aware system that adapts the learning rate based on the relatedness among tasks.
We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer.
- Score: 21.779858050277475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When an agent encounters a continual stream of new tasks in the lifelong
learning setting, it leverages the knowledge it gained from the earlier tasks
to help learn the new tasks better. In such a scenario, identifying an
efficient knowledge representation becomes a challenging problem. Most research
works propose to either store a subset of examples from the past tasks in a
replay buffer, dedicate a separate set of parameters to each task or penalize
excessive updates over parameters by introducing a regularization term. While
existing methods employ the general task-agnostic stochastic gradient descent
update rule, we propose a task-aware optimizer that adapts the learning rate
based on the relatedness among tasks. We utilize the directions taken by the
parameters during the updates by accumulating the gradients specific to each
task. These task-based accumulated gradients act as a knowledge base that is
maintained and updated throughout the stream. We empirically show that our
proposed adaptive learning rate not only accounts for catastrophic forgetting
but also allows positive backward transfer. We also show that our method
performs better than several state-of-the-art methods in lifelong learning on
complex datasets with a large number of tasks.
Related papers
- Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Subspace Adaptation Prior for Few-Shot Learning [5.2997197698288945]
Subspace Adaptation Prior is a novel gradient-based meta-learning algorithm.
We show that SAP yields superior or competitive performance in few-shot image classification settings.
arXiv Detail & Related papers (2023-10-13T11:40:18Z) - Clustering-based Domain-Incremental Learning [4.835091081509403]
Key challenge in continual learning is the so-called "catastrophic forgetting problem"
We propose an online clustering-based approach on a dynamically updated finite pool of samples or gradients.
We demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-09-21T13:49:05Z) - Task Difficulty Aware Parameter Allocation & Regularization for Lifelong
Learning [20.177260510548535]
We propose the Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty.
Our method is scalable and significantly reduces the model's redundancy while improving the model's performance.
arXiv Detail & Related papers (2023-04-11T15:38:21Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community.
Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored.
Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate.
In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks.
For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z) - Relational Experience Replay: Continual Learning by Adaptively Tuning
Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff.
ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z) - Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral [18.387162887917164]
We formulate a model-agnostic framework that performs fine-grained manipulation of the auxiliary task gradients.
We propose to decompose auxiliary updates into directions which help, damage or leave the primary task loss unchanged.
Our approach consistently outperforms strong and widely used baselines when leveraging out-of-distribution data for Text and Image classification tasks.
arXiv Detail & Related papers (2021-08-25T17:09:48Z) - Instance-Level Task Parameters: A Robust Multi-task Weighting Framework [17.639472693362926]
Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks.
We let the training process dictate the optimal weighting of tasks for every instance in the dataset.
We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-11T02:35:42Z) - Continual Learning via Bit-Level Information Preserving [88.32450740325005]
We study the continual learning process through the lens of information theory.
We propose Bit-Level Information Preserving (BLIP) that preserves the information gain on model parameters.
BLIP achieves close to zero forgetting while only requiring constant memory overheads throughout continual learning.
arXiv Detail & Related papers (2021-05-10T15:09:01Z) - Efficient Continual Adaptation for Generative Adversarial Networks [97.20244383723853]
We present a continual learning approach for generative adversarial networks (GANs)
Our approach is based on learning a set of global and task-specific parameters.
We show that the feature-map transformation based approach outperforms state-of-the-art continual GANs methods.
arXiv Detail & Related papers (2021-03-06T05:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.