Distilling Causal Effect of Data in Class-Incremental Learning
- URL: http://arxiv.org/abs/2103.01737v2
- Date: Thu, 4 Mar 2021 08:37:50 GMT
- Title: Distilling Causal Effect of Data in Class-Incremental Learning
- Authors: Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang
- Abstract summary: We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL)
We derive a novel distillation method that is to mitigate to the existing anti-forgetting techniques, such as data replay and feature/label distillation.
- Score: 109.680987556265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a causal framework to explain the catastrophic forgetting in
Class-Incremental Learning (CIL) and then derive a novel distillation method
that is orthogonal to the existing anti-forgetting techniques, such as data
replay and feature/label distillation. We first 1) place CIL into the
framework, 2) answer why the forgetting happens: the causal effect of the old
data is lost in new training, and then 3) explain how the existing techniques
mitigate it: they bring the causal effect back. Based on the framework, we find
that although the feature/label distillation is storage-efficient, its causal
effect is not coherent with the end-to-end feature learning merit, which is
however preserved by data replay. To this end, we propose to distill the
Colliding Effect between the old and the new data, which is fundamentally
equivalent to the causal effect of data replay, but without any cost of replay
storage. Thanks to the causal effect analysis, we can further capture the
Incremental Momentum Effect of the data stream, removing which can help to
retain the old effect overwhelmed by the new data effect, and thus alleviate
the forgetting of the old class in testing. Extensive experiments on three CIL
benchmarks: CIFAR-100, ImageNet-Sub&Full, show that the proposed causal effect
distillation can improve various state-of-the-art CIL methods by a large margin
(0.72%--9.06%).
Related papers
- Knowledge Distillation with Refined Logits [31.205248790623703]
We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods.
Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions.
Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
arXiv Detail & Related papers (2024-08-14T17:59:32Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - The Staged Knowledge Distillation in Video Classification: Harmonizing
Student Progress by a Complementary Weakly Supervised Framework [21.494759678807686]
We propose a new weakly supervised learning framework for knowledge distillation in video classification.
Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages.
Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
arXiv Detail & Related papers (2023-07-11T12:10:42Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Explicit and Implicit Knowledge Distillation via Unlabeled Data [5.702176304876537]
We propose an efficient unlabeled sample selection method to replace high computational generators.
We also propose a class-dropping mechanism to suppress the label noise caused by the data domain shifts.
Experimental results show that our method can quickly converge and obtain higher accuracy than other state-of-the-art methods.
arXiv Detail & Related papers (2023-02-17T09:10:41Z) - Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free
Replay [52.251188477192336]
Few-shot class-incremental learning (FSCIL) has been proposed aiming to enable a deep learning system to incrementally learn new classes with limited data.
We show through empirical results that adopting the data replay is surprisingly favorable.
We propose using data-free replay that can synthesize data by a generator without accessing real data.
arXiv Detail & Related papers (2022-07-22T17:30:51Z) - Overcoming Catastrophic Forgetting in Incremental Object Detection via
Elastic Response Distillation [4.846235640334886]
Traditional object detectors are ill-equipped for incremental learning.
Fine-tuning directly on a well-trained detection model with only new data will lead to catastrophic forgetting.
We propose a response-based incremental distillation method, dubbed Elastic Response Distillation (ERD)
arXiv Detail & Related papers (2022-04-05T11:57:43Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.