TAME: Task Agnostic Continual Learning using Multiple Experts
- URL: http://arxiv.org/abs/2210.03869v2
- Date: Sun, 2 Jun 2024 15:17:58 GMT
- Title: TAME: Task Agnostic Continual Learning using Multiple Experts
- Authors: Haoran Zhu, Maryam Majzoubi, Arihant Jain, Anna Choromanska,
- Abstract summary: This paper focuses on a so-called task-agnostic setting where the task identities are not known and the learning machine needs to infer them from the observations.
Our algorithm, which we call TAME (Task-Agnostic continual learning using Multiple Experts), automatically detects the shift in data distributions and switches between task expert networks in an online manner.
Our experimental results show the efficacy of our approach on benchmark continual learning data sets, outperforming the previous task-agnostic methods and even the techniques that admit task identities at both training and testing.
- Score: 9.89894348908034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of lifelong learning is to continuously learn from non-stationary distributions, where the non-stationarity is typically imposed by a sequence of distinct tasks. Prior works have mostly considered idealistic settings, where the identity of tasks is known at least at training. In this paper we focus on a fundamentally harder, so-called task-agnostic setting where the task identities are not known and the learning machine needs to infer them from the observations. Our algorithm, which we call TAME (Task-Agnostic continual learning using Multiple Experts), automatically detects the shift in data distributions and switches between task expert networks in an online manner. At training, the strategy for switching between tasks hinges on an extremely simple observation that for each new coming task there occurs a statistically-significant deviation in the value of the loss function that marks the onset of this new task. At inference, the switching between experts is governed by the selector network that forwards the test sample to its relevant expert network. The selector network is trained on a small subset of data drawn uniformly at random. We control the growth of the task expert networks as well as selector network by employing online pruning. Our experimental results show the efficacy of our approach on benchmark continual learning data sets, outperforming the previous task-agnostic methods and even the techniques that admit task identities at both training and testing, while at the same time using a comparable model size.
Related papers
- Negotiated Representations to Prevent Forgetting in Machine Learning
Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning.
We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Multitask Learning with No Regret: from Improved Confidence Bounds to
Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning.
We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner.
We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - Online Continual Learning via the Knowledge Invariant and Spread-out
Properties [4.109784267309124]
Key challenge in continual learning is catastrophic forgetting.
We propose a new method, named Online Continual Learning via the Knowledge Invariant and Spread-out Properties (OCLKISP)
We empirically evaluate our proposed method on four popular benchmarks for continual learning: Split CIFAR 100, Split SVHN, Split CUB200 and Split Tiny-Image-Net.
arXiv Detail & Related papers (2023-02-02T04:03:38Z) - Selecting task with optimal transport self-supervised learning for
few-shot classification [15.088213168796772]
Few-Shot classification aims at solving problems that only a few samples are available in the training process.
We propose a novel task selecting algorithm, named Optimal Transport Task Selecting (OTTS), to construct a training set by selecting similar tasks for Few-Shot learning.
OTTS measures the task similarity by calculating the optimal transport distance and completes the model training via a self-supervised strategy.
arXiv Detail & Related papers (2022-04-01T08:45:29Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z) - Self-Attention Meta-Learner for Continual Learning [5.979373021392084]
Self-Attention Meta-Learner (SAM) learns a prior knowledge for continual learning that permits learning a sequence of tasks.
SAM incorporates an attention mechanism that learns to select the particular relevant representation for each future task.
We evaluate the proposed method on the Split CIFAR-10/100 and Split MNIST benchmarks in the task inference.
arXiv Detail & Related papers (2021-01-28T17:35:04Z) - Expert Training: Task Hardness Aware Meta-Learning for Few-Shot
Classification [62.10696018098057]
We propose an easy-to-hard expert meta-training strategy to arrange the training tasks properly.
A task hardness aware module is designed and integrated into the training procedure to estimate the hardness of a task.
Experimental results on the miniImageNet and tieredImageNetSketch datasets show that the meta-learners can obtain better results with our expert training strategy.
arXiv Detail & Related papers (2020-07-13T08:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.