Disentangling and Mitigating the Impact of Task Similarity for Continual Learning
- URL: http://arxiv.org/abs/2405.20236v1
- Date: Thu, 30 May 2024 16:40:07 GMT
- Title: Disentangling and Mitigating the Impact of Task Similarity for Continual Learning
- Authors: Naoki Hiratani,
- Abstract summary: Continual learning of partially similar tasks poses a challenge for artificial neural networks.
High input feature similarity coupled with low readout similarity is catastrophic for both knowledge transfer and retention.
Weight regularization based on the Fisher information metric significantly improves retention, regardless of task similarity.
- Score: 1.3597551064547502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning of partially similar tasks poses a challenge for artificial neural networks, as task similarity presents both an opportunity for knowledge transfer and a risk of interference and catastrophic forgetting. However, it remains unclear how task similarity in input features and readout patterns influences knowledge transfer and forgetting, as well as how they interact with common algorithms for continual learning. Here, we develop a linear teacher-student model with latent structure and show analytically that high input feature similarity coupled with low readout similarity is catastrophic for both knowledge transfer and retention. Conversely, the opposite scenario is relatively benign. Our analysis further reveals that task-dependent activity gating improves knowledge retention at the expense of transfer, while task-dependent plasticity gating does not affect either retention or transfer performance at the over-parameterized limit. In contrast, weight regularization based on the Fisher information metric significantly improves retention, regardless of task similarity, without compromising transfer performance. Nevertheless, its diagonal approximation and regularization in the Euclidean space are much less robust against task similarity. We demonstrate consistent results in a permuted MNIST task with latent variables. Overall, this work provides insights into when continual learning is difficult and how to mitigate it.
Related papers
- Enabling Asymmetric Knowledge Transfer in Multi-Task Learning with Self-Auxiliaries [4.031100721019478]
We investigate asymmetric task relationships, where knowledge transfer aids the learning of certain tasks while hindering the learning of others.
We propose an optimisation strategy that includes additional cloned tasks named self-auxiliaries into the learning process to flexibly transfer knowledge between tasks asymmetrically.
We demonstrate that asymmetric knowledge transfer provides substantial improvements in performance compared to existing multi-task optimisation strategies on benchmark computer vision problems.
arXiv Detail & Related papers (2024-10-21T10:57:25Z) - Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning [26.949872705635084]
We present the Similarity Heuristic Lifelong Prompt Tuning (SH) framework.
SH partitions tasks into two distinct subsets by harnessing a learnable similarity metric.
Our experiments shows that SH outperforms state-of-the-art techniques in lifelong learning benchmarks.
arXiv Detail & Related papers (2024-06-18T03:57:49Z) - Similarity-based Knowledge Transfer for Cross-Domain Reinforcement
Learning [3.3148826359547523]
We develop a semi-supervised alignment loss to match different spaces with a set of encoder-decoders.
In comparison to prior works, our method does not require data to be aligned, paired or collected by expert policies.
arXiv Detail & Related papers (2023-12-05T19:26:01Z) - Evaluating the structure of cognitive tasks with transfer learning [67.22168759751541]
This study investigates the transferability of deep learning representations between different EEG decoding tasks.
We conduct extensive experiments using state-of-the-art decoding models on two recently released EEG datasets.
arXiv Detail & Related papers (2023-07-28T14:51:09Z) - On Neural Consolidation for Transfer in Reinforcement Learning [4.129225533930966]
We explore the use of network distillation as a feature extraction method to better understand the context in which transfer can occur.
We show that distillation does not prevent knowledge transfer, including when transferring from multiple tasks to a new one, and we compare these results with transfer without prior distillation.
arXiv Detail & Related papers (2022-10-05T13:18:47Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - Understanding Contrastive Learning Requires Incorporating Inductive
Biases [64.56006519908213]
Recent attempts to theoretically explain the success of contrastive learning on downstream tasks prove guarantees depending on properties of em augmentations and the value of em contrastive loss of representations.
We demonstrate that such analyses ignore em inductive biases of the function class and training algorithm, even em provably leading to vacuous guarantees in some settings.
arXiv Detail & Related papers (2022-02-28T18:59:20Z) - Why Do Self-Supervised Models Transfer? Investigating the Impact of
Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images.
We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z) - Continual Learning in the Teacher-Student Setup: Impact of Task
Similarity [5.1135133995376085]
We study catastrophic forgetting in two-layer networks in the teacher-student setup.
We find that when tasks depend on similar features, intermediate task similarity leads to greatest forgetting.
We find a complex interplay between both types of similarity, initial transfer/forgetting rates, maximum transfer/forgetting, and long-term transfer/forgetting.
arXiv Detail & Related papers (2021-07-09T12:30:39Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.