The Joint Effect of Task Similarity and Overparameterization on
Catastrophic Forgetting -- An Analytical Model
- URL: http://arxiv.org/abs/2401.12617v2
- Date: Wed, 24 Jan 2024 12:49:24 GMT
- Title: The Joint Effect of Task Similarity and Overparameterization on
Catastrophic Forgetting -- An Analytical Model
- Authors: Daniel Goldfarb, Itay Evron, Nir Weinberger, Daniel Soudry, Paul Hand
- Abstract summary: In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks.
Previous works have analyzed separately how forgetting is affected by either task similarity or over parameterization.
This paper examines how task similarity and over parameterization jointly affect forgetting in an analyzable model.
- Score: 36.766748277141744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In continual learning, catastrophic forgetting is affected by multiple
aspects of the tasks. Previous works have analyzed separately how forgetting is
affected by either task similarity or overparameterization. In contrast, our
paper examines how task similarity and overparameterization jointly affect
forgetting in an analyzable model. Specifically, we focus on two-task continual
linear regression, where the second task is a random orthogonal transformation
of an arbitrary first task (an abstraction of random permutation tasks). We
derive an exact analytical expression for the expected forgetting - and uncover
a nuanced pattern. In highly overparameterized models, intermediate task
similarity causes the most forgetting. However, near the interpolation
threshold, forgetting decreases monotonically with the expected task
similarity. We validate our findings with linear regression on synthetic data,
and with neural networks on established permutation task benchmarks.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Exposing and Addressing Cross-Task Inconsistency in Unified
Vision-Language Models [80.23791222509644]
Inconsistent AI models are considered brittle and untrustworthy by human users.
We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks.
We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
arXiv Detail & Related papers (2023-03-28T16:57:12Z) - Analysis of Catastrophic Forgetting for Random Orthogonal Transformation
Tasks in the Overparameterized Regime [9.184987303791292]
We show that in permuted MNIST image classification tasks, the performance of multilayer perceptrons trained by vanilla gradient descent can be improved.
We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem.
We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small.
arXiv Detail & Related papers (2022-06-01T18:04:33Z) - How catastrophic can catastrophic forgetting be in linear regression? [30.702863017223457]
We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks.
We establish connections between continual learning in the linear setting and two other research areas: alternating projections and the Kaczmarz method.
arXiv Detail & Related papers (2022-05-19T14:28:40Z) - Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order.
A novel loss function is also proposed to effectively train the saliency ranking branch.
experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z) - An Information-Theoretic Analysis of the Impact of Task Similarity on
Meta-Learning [44.320945743871285]
We present novel information-theoretic bounds on the average absolute value of the meta-generalization gap.
Our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap.
arXiv Detail & Related papers (2021-01-21T01:38:16Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Precise High-Dimensional Asymptotics for Quantifying Heterogeneous
Transfers [34.40702005466919]
When is combining data from two tasks better than learning one task alone?
This paper uses random matrix theory to tackle this challenge in a linear regression setting with two tasks.
arXiv Detail & Related papers (2020-10-22T14:14:20Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.