Linear Mode Connectivity in Multitask and Continual Learning
- URL: http://arxiv.org/abs/2010.04495v1
- Date: Fri, 9 Oct 2020 10:53:25 GMT
- Title: Linear Mode Connectivity in Multitask and Continual Learning
- Authors: Seyed Iman Mirzadeh, Mehrdad Farajtabar, Dilan Gorur, Razvan Pascanu,
Hassan Ghasemzadeh
- Abstract summary: We investigate whether multitask and continual solutions are similarly connected.
We propose an effective algorithm that constrains the sequentially learned minima to behave as the multitask solution.
- Score: 46.98656798573886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual (sequential) training and multitask (simultaneous) training are
often attempting to solve the same overall objective: to find a solution that
performs well on all considered tasks. The main difference is in the training
regimes, where continual learning can only have access to one task at a time,
which for neural networks typically leads to catastrophic forgetting. That is,
the solution found for a subsequent task does not perform well on the previous
ones anymore. However, the relationship between the different minima that the
two training regimes arrive at is not well understood. What sets them apart? Is
there a local structure that could explain the difference in performance
achieved by the two different schemes? Motivated by recent work showing that
different minima of the same task are typically connected by very simple curves
of low error, we investigate whether multitask and continual solutions are
similarly connected. We empirically find that indeed such connectivity can be
reliably achieved and, more interestingly, it can be done by a linear path,
conditioned on having the same initialization for both. We thoroughly analyze
this observation and discuss its significance for the continual learning
process. Furthermore, we exploit this finding to propose an effective algorithm
that constrains the sequentially learned minima to behave as the multitask
solution. We show that our method outperforms several state of the art
continual learning algorithms on various vision benchmarks.
Related papers
- Multitask Learning with No Regret: from Improved Confidence Bounds to
Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning.
We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner.
We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Leveraging convergence behavior to balance conflicting tasks in
multi-task learning [3.6212652499950138]
Multi-Task Learning uses correlated tasks to improve performance generalization.
Tasks often conflict with each other, which makes it challenging to define how the gradients of multiple tasks should be combined.
We propose a method that takes into account temporal behaviour of the gradients to create a dynamic bias that adjust the importance of each task during the backpropagation.
arXiv Detail & Related papers (2022-04-14T01:52:34Z) - Fast Line Search for Multi-Task Learning [0.0]
We propose a novel idea for line search algorithms in multi-task learning.
The idea is to use latent representation space instead of parameter space for finding step size.
We compare this idea with classical backtracking and gradient methods with a constant learning rate on MNIST, CIFAR-10, Cityscapes tasks.
arXiv Detail & Related papers (2021-10-02T21:02:29Z) - Efficiently Identifying Task Groupings for Multi-Task Learning [55.80489920205404]
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
We suggest an approach to select which tasks should train together in multi-task learning models.
Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss.
arXiv Detail & Related papers (2021-09-10T02:01:43Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Auxiliary Learning by Implicit Differentiation [54.92146615836611]
Training neural networks with auxiliary tasks is a common practice for improving the performance on a main task of interest.
Here, we propose a novel framework, AuxiLearn, that targets both challenges based on implicit differentiation.
First, when useful auxiliaries are known, we propose learning a network that combines all losses into a single coherent objective function.
Second, when no useful auxiliary task is known, we describe how to learn a network that generates a meaningful, novel auxiliary task.
arXiv Detail & Related papers (2020-06-22T19:35:07Z) - Multitask learning over graphs: An Approach for Distributed, Streaming
Machine Learning [46.613346075513206]
Multitask learning is an approach to inductive transfer learning.
Recent years have witnessed an increasing ability to collect data in a distributed and streaming manner.
This requires the design of new strategies for learning jointly multiple tasks from streaming data over distributed (or networked) systems.
arXiv Detail & Related papers (2020-01-07T15:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.