Continual Learning in the Teacher-Student Setup: Impact of Task
Similarity
- URL: http://arxiv.org/abs/2107.04384v1
- Date: Fri, 9 Jul 2021 12:30:39 GMT
- Title: Continual Learning in the Teacher-Student Setup: Impact of Task
Similarity
- Authors: Sebastian Lee and Sebastian Goldt and Andrew Saxe
- Abstract summary: We study catastrophic forgetting in two-layer networks in the teacher-student setup.
We find that when tasks depend on similar features, intermediate task similarity leads to greatest forgetting.
We find a complex interplay between both types of similarity, initial transfer/forgetting rates, maximum transfer/forgetting, and long-term transfer/forgetting.
- Score: 5.1135133995376085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning-the ability to learn many tasks in sequence-is critical
for artificial learning systems. Yet standard training methods for deep
networks often suffer from catastrophic forgetting, where learning new tasks
erases knowledge of earlier tasks. While catastrophic forgetting labels the
problem, the theoretical reasons for interference between tasks remain unclear.
Here, we attempt to narrow this gap between theory and practice by studying
continual learning in the teacher-student setup. We extend previous analytical
work on two-layer networks in the teacher-student setup to multiple teachers.
Using each teacher to represent a different task, we investigate how the
relationship between teachers affects the amount of forgetting and transfer
exhibited by the student when the task switches. In line with recent work, we
find that when tasks depend on similar features, intermediate task similarity
leads to greatest forgetting. However, feature similarity is only one way in
which tasks may be related. The teacher-student approach allows us to
disentangle task similarity at the level of readouts (hidden-to-output weights)
and features (input-to-hidden weights). We find a complex interplay between
both types of similarity, initial transfer/forgetting rates, maximum
transfer/forgetting, and long-term transfer/forgetting. Together, these results
help illuminate the diverse factors contributing to catastrophic forgetting.
Related papers
- Disentangling and Mitigating the Impact of Task Similarity for Continual Learning [1.3597551064547502]
Continual learning of partially similar tasks poses a challenge for artificial neural networks.
High input feature similarity coupled with low readout similarity is catastrophic for both knowledge transfer and retention.
Weight regularization based on the Fisher information metric significantly improves retention, regardless of task similarity.
arXiv Detail & Related papers (2024-05-30T16:40:07Z) - Multitask Learning with No Regret: from Improved Confidence Bounds to
Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning.
We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner.
We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Saliency-Regularized Deep Multi-Task Learning [7.3810864598379755]
Multitask learning enforces multiple learning tasks to share knowledge to improve their generalization abilities.
Modern deep multitask learning can jointly learn latent features and task sharing, but they are obscure in task relation.
This paper proposes a new multitask learning framework that jointly learns latent features and explicit task relations.
arXiv Detail & Related papers (2022-07-03T20:26:44Z) - Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - Continual Learning in Low-rank Orthogonal Subspaces [86.36417214618575]
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the learning experience is finished.
The prior art in CL uses episodic memory, parameter regularization or network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space.
We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference.
arXiv Detail & Related papers (2020-10-22T12:07:43Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z) - Multitask learning over graphs: An Approach for Distributed, Streaming
Machine Learning [46.613346075513206]
Multitask learning is an approach to inductive transfer learning.
Recent years have witnessed an increasing ability to collect data in a distributed and streaming manner.
This requires the design of new strategies for learning jointly multiple tasks from streaming data over distributed (or networked) systems.
arXiv Detail & Related papers (2020-01-07T15:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.