Effective Decision Boundary Learning for Class Incremental Learning
- URL: http://arxiv.org/abs/2301.05180v4
- Date: Thu, 26 Sep 2024 02:16:07 GMT
- Title: Effective Decision Boundary Learning for Class Incremental Learning
- Authors: Kunchi Li, Jun Wan, Shan Yu,
- Abstract summary: Rehearsal approaches in class incremental learning (CIL) suffer from decision boundary overfitting to new classes.
We present a simple but effective approach to tackle these two factors.
Experiments show that the proposedL achieves the proposedL state-of-the-art performances on several CIL benchmarks.
- Score: 17.716035569936384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rehearsal approaches in class incremental learning (CIL) suffer from decision boundary overfitting to new classes, which is mainly caused by two factors: insufficiency of old classes data for knowledge distillation and imbalanced data learning between the learned and new classes because of the limited storage memory. In this work, we present a simple but effective approach to tackle these two factors. First, we employ a re-sampling strategy and Mixup K}nowledge D}istillation (Re-MKD) to improve the performances of KD, which would greatly alleviate the overfitting problem. Specifically, we combine mixup and re-sampling strategies to synthesize adequate data used in KD training that are more consistent with the latent distribution between the learned and new classes. Second, we propose a novel incremental influence balance (IIB) method for CIL to tackle the classification of imbalanced data by extending the influence balance method into the CIL setting, which re-weights samples by their influences to create a proper decision boundary. With these two improvements, we present the effective decision boundary learning algorithm (EDBL) which improves the performance of KD and deals with the imbalanced data learning simultaneously. Experiments show that the proposed EDBL achieves state-of-the-art performances on several CIL benchmarks.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning [82.79371269942146]
We propose to tackle FewShot Class-Incremental Learning (FSCIL) from a new perspective, i.e., relation disentanglement.
The challenge of disentangling spurious correlations lies in the poor controllability of FSCIL.
We propose a new simple-yeteffective method, called ConTrollable Relation-disentang FewShot Class-Incremental Learning (CTRL-FSCIL)
arXiv Detail & Related papers (2024-03-17T03:16:59Z) - Gradient Reweighting: Towards Imbalanced Class-Incremental Learning [8.438092346233054]
Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data.
A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution.
We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL.
arXiv Detail & Related papers (2024-02-28T18:08:03Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - Comparative Knowledge Distillation [102.35425896967791]
Traditional Knowledge Distillation (KD) assumes readily available access to teacher models for frequent inference.
We propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples.
CKD consistently outperforms state of the art data augmentation and KD techniques.
arXiv Detail & Related papers (2023-11-03T21:55:33Z) - DualMix: Unleashing the Potential of Data Augmentation for Online
Class-Incremental Learning [14.194817677415065]
We show that augmented samples with lower correlation to the original data are more effective in preventing forgetting.
We propose the Enhanced Mixup (EnMix) method that mixes the augmented samples and their labels simultaneously.
To solve the class imbalance problem, we design an Adaptive Mixup (AdpMix) method to calibrate the decision boundaries.
arXiv Detail & Related papers (2023-03-14T12:55:42Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - Exploring Example Influence in Continual Learning [26.85320841575249]
Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S) and Plasticity (P)
It is valuable to explore the influence difference on S and P among training examples, which may improve the learning pattern towards better SP.
We propose a simple yet effective MetaSP algorithm to simulate the two key steps in the perturbation of IF and obtain the S- and P-aware example influence.
arXiv Detail & Related papers (2022-09-25T15:17:37Z) - Influence-Balanced Loss for Imbalanced Visual Classification [9.958715010698157]
We derive a new loss used in the balancing training phase that alleviates the influence of samples that cause an overfitted decision boundary.
In experiments on multiple benchmark data sets, we demonstrate the validity of our method and reveal that the proposed loss outperforms the state-of-the-art cost-sensitive loss methods.
arXiv Detail & Related papers (2021-10-06T01:12:40Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Continual Learning with Node-Importance based Adaptive Group Sparse
Regularization [30.23319528662881]
We propose a novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL)
Our method selectively employs the two penalties when learning each node based its the importance, which is adaptively updated after learning each new task.
arXiv Detail & Related papers (2020-03-30T18:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.