Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay
- URL: http://arxiv.org/abs/2106.09835v1
- Date: Thu, 17 Jun 2021 22:13:15 GMT
- Title: Dual-Teacher Class-Incremental Learning With Data-Free Generative Replay
- Authors: Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
- Abstract summary: We propose two novel knowledge transfer techniques for class-incremental learning (CIL)
First, we propose data-free generative replay (DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples from a generative model.
Second, we introduce dual-teacher information distillation (DT-ID) for knowledge distillation from two teachers to one student.
- Score: 49.691610143011566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes two novel knowledge transfer techniques for
class-incremental learning (CIL). First, we propose data-free generative replay
(DF-GR) to mitigate catastrophic forgetting in CIL by using synthetic samples
from a generative model. In the conventional generative replay, the generative
model is pre-trained for old data and shared in extra memory for later
incremental learning. In our proposed DF-GR, we train a generative model from
scratch without using any training data, based on the pre-trained
classification model from the past, so we curtail the cost of sharing
pre-trained generative models. Second, we introduce dual-teacher information
distillation (DT-ID) for knowledge distillation from two teachers to one
student. In CIL, we use DT-ID to learn new classes incrementally based on the
pre-trained model for old classes and another model (pre-)trained on the new
data for new classes. We implemented the proposed schemes on top of one of the
state-of-the-art CIL methods and showed the performance improvement on
CIFAR-100 and ImageNet datasets.
Related papers
- Adapt & Align: Continual Learning with Generative Models Latent Space
Alignment [15.729732755625474]
We introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models.
Neural Networks suffer from abrupt loss in performance when retrained with additional data.
We propose a new method that mitigates those problems by employing generative models and splitting the process of their update into two parts.
arXiv Detail & Related papers (2023-12-21T10:02:17Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Revisiting the Updates of a Pre-trained Model for Few-shot Learning [11.871523410051527]
We compare the two popular updating methods, fine-tuning and linear probing.
We find that fine-tuning is better than linear probing as the number of samples increases.
arXiv Detail & Related papers (2022-05-13T08:47:06Z) - Reproducible, incremental representation learning with Rosetta VAE [0.0]
Variational autoencoders are among the most popular methods for distilling low-dimensional structure from high-dimensional data.
We introduce the Rosetta VAE, a method of distilling previously learned representations and retraining new models to reproduce and build on prior results.
We demonstrate that the R-VAE reconstructs data as well as the VAE and $beta$-VAE, outperforms both methods in recovery of a target latent space in a sequential training setting.
arXiv Detail & Related papers (2022-01-13T20:45:35Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - Learning Adaptive Embedding Considering Incremental Class [55.21855842960139]
Class-Incremental Learning (CIL) aims to train a reliable model with the streaming data, which emerges unknown classes sequentially.
Different from traditional closed set learning, CIL has two main challenges: 1) Novel class detection.
After the novel classes are detected, the model needs to be updated without re-training using entire previous data.
arXiv Detail & Related papers (2020-08-31T04:11:24Z) - Two-Level Residual Distillation based Triple Network for Incremental
Object Detection [21.725878050355824]
We propose a novel incremental object detector based on Faster R-CNN to continuously learn from new object classes without using old data.
It is a triple network where an old model and a residual model as assistants for helping the incremental model learning on new classes without forgetting the previous learned knowledge.
arXiv Detail & Related papers (2020-07-27T11:04:57Z) - Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN [80.17705319689139]
We propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers.
The proposed method without any training data achieves the surprisingly competitive results, even compared with some full-supervised methods.
arXiv Detail & Related papers (2020-03-20T03:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.