Learn Faster and Forget Slower via Fast and Stable Task Adaptation
- URL: http://arxiv.org/abs/2007.01388v2
- Date: Sun, 29 Nov 2020 16:01:50 GMT
- Title: Learn Faster and Forget Slower via Fast and Stable Task Adaptation
- Authors: Farshid Varno and Lucas May Petry and Lisa Di Jorio and Stan Matwin
- Abstract summary: Current fine-tuning techniques make pretrained models forget the transferred knowledge even before anything about the new task is learned.
We introduce Fast And Stable Task-adaptation (FAST), an easy to apply fine-tuning algorithm.
We empirically show that compared to prevailing fine-tuning practices, FAST learns the target task faster and forgets the source task slower.
- Score: 10.696114236900808
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Training Deep Neural Networks (DNNs) is still highly time-consuming and
compute-intensive. It has been shown that adapting a pretrained model may
significantly accelerate this process. With a focus on classification, we show
that current fine-tuning techniques make the pretrained models catastrophically
forget the transferred knowledge even before anything about the new task is
learned. Such rapid knowledge loss undermines the merits of transfer learning
and may result in a much slower convergence rate compared to when the maximum
amount of knowledge is exploited. We investigate the source of this problem
from different perspectives and to alleviate it, introduce Fast And Stable
Task-adaptation (FAST), an easy to apply fine-tuning algorithm. The paper
provides a novel geometric perspective on how the loss landscape of source and
target tasks are linked in different transfer learning strategies. We
empirically show that compared to prevailing fine-tuning practices, FAST learns
the target task faster and forgets the source task slower.
Related papers
- Why pre-training is beneficial for downstream classification tasks? [32.331679393303446]
We propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view.
Specifically, we extract and quantify the knowledge encoded by the pre-trained model.
We discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks.
arXiv Detail & Related papers (2024-10-11T02:13:16Z) - Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class
Incremental Learning [64.14254712331116]
Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past.
We propose a novel framework of fine-grained knowledge selection and restoration.
arXiv Detail & Related papers (2023-12-20T02:34:11Z) - TOAST: Transfer Learning via Attention Steering [77.83191769502763]
Current transfer learning methods often fail to focus on task-relevant features.
We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that steers the attention to task-specific features.
TOAST substantially improves performance across a range of fine-grained visual classification datasets.
arXiv Detail & Related papers (2023-05-24T20:03:04Z) - Online Continual Learning via the Knowledge Invariant and Spread-out
Properties [4.109784267309124]
Key challenge in continual learning is catastrophic forgetting.
We propose a new method, named Online Continual Learning via the Knowledge Invariant and Spread-out Properties (OCLKISP)
We empirically evaluate our proposed method on four popular benchmarks for continual learning: Split CIFAR 100, Split SVHN, Split CUB200 and Split Tiny-Image-Net.
arXiv Detail & Related papers (2023-02-02T04:03:38Z) - On effects of Knowledge Distillation on Transfer Learning [0.0]
We propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning.
We show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy.
arXiv Detail & Related papers (2022-10-18T08:11:52Z) - Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative
Priors [59.93972277761501]
We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches.
This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks.
arXiv Detail & Related papers (2022-05-20T16:19:30Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Center Loss Regularization for Continual Learning [0.0]
In general, neural networks lack the ability to learn different tasks sequentially.
Our approach remembers old tasks by projecting the representations of new tasks close to that of old tasks.
We demonstrate that our approach is scalable, effective, and gives competitive performance compared to state-of-the-art continual learning methods.
arXiv Detail & Related papers (2021-10-21T17:46:44Z) - Essentials for Class Incremental Learning [43.306374557919646]
Class-incremental learning results on CIFAR-100 and ImageNet improve over the state-of-the-art by a large margin, while keeping the approach simple.
arXiv Detail & Related papers (2021-02-18T18:01:06Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - iTAML: An Incremental Task-Agnostic Meta-learning Approach [123.10294801296926]
Humans can continuously learn new knowledge as their experience grows.
Previous learning in deep neural networks can quickly fade out when they are trained on a new task.
We introduce a novel meta-learning approach that seeks to maintain an equilibrium between all encountered tasks.
arXiv Detail & Related papers (2020-03-25T21:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.