Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning
- URL: http://arxiv.org/abs/2506.15720v1
- Date: Wed, 04 Jun 2025 01:41:42 GMT
- Title: Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning
- Authors: Juntae Lee, Munawar Hayat, Sungrack Yun,
- Abstract summary: Few-shot class incremental learning (FSCIL) enables the continual learning of new concepts with only a few training examples.<n>Recent trend is typically to disentangle the learning of the representation from the classification head of the model.<n>We introduce a novel FSCIL method to effectively address catastrophic forgetting and overfitting issues.
- Score: 30.111708221388813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot class incremental learning (FSCIL) enables the continual learning of new concepts with only a few training examples. In FSCIL, the model undergoes substantial updates, making it prone to forgetting previous concepts and overfitting to the limited new examples. Most recent trend is typically to disentangle the learning of the representation from the classification head of the model. A well-generalized feature extractor on the base classes (many examples and many classes) is learned, and then fixed during incremental learning. Arguing that the fixed feature extractor restricts the model's adaptability to new classes, we introduce a novel FSCIL method to effectively address catastrophic forgetting and overfitting issues. Our method enables to seamlessly update the entire model with a few examples. We mainly propose a tripartite weight-space ensemble (Tri-WE). Tri-WE interpolates the base, immediately previous, and current models in weight-space, especially for the classification heads of the models. Then, it collaboratively maintains knowledge from the base and previous models. In addition, we recognize the challenges of distilling generalized representations from the previous model from scarce data. Hence, we suggest a regularization loss term using amplified data knowledge distillation. Simply intermixing the few-shot data, we can produce richer data enabling the distillation of critical knowledge from the previous model. Consequently, we attain state-of-the-art results on the miniImageNet, CUB200, and CIFAR100 datasets.
Related papers
- Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z) - Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion [10.322832012497722]
Class-incremental learning is a challenging problem, where the goal is to train a model that can classify data from an increasing number of classes over time.
With the advancement of vision-language pre-trained models such as CLIP, they demonstrate good generalization ability.
However, further adaptation to downstream tasks by simply fine-tuning the model leads to severe forgetting.
Most existing works with pre-trained models assume that the forgetting of old classes is uniform when the model acquires new knowledge.
arXiv Detail & Related papers (2024-07-19T09:20:33Z) - Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers [12.590571371294729]
Few-shot class-incremental learning (FSCIL) aims to adapt the model to new classes from very few data (5 samples) without forgetting the previously learned classes.
Recent works in many-shot CIL (MSCIL) exploited pre-trained models to reduce forgetting and achieve better plasticity.
We use ViT models pre-trained on large-scale datasets for few-shot settings, which face the critical issue of low plasticity.
arXiv Detail & Related papers (2024-04-09T21:12:31Z) - Class incremental learning with probability dampening and cascaded gated classifier [4.285597067389559]
We propose a novel incremental regularisation approach called Margin Dampening and Cascaded Scaling.
The first combines a soft constraint and a knowledge distillation approach to preserve past knowledge while allowing forgetting new patterns.
We empirically show that our approach performs well on multiple benchmarks well-established baselines.
arXiv Detail & Related papers (2024-02-02T09:33:07Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [93.90047628101155]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.<n>To address this, some methods propose replaying data from previous tasks during new task learning.<n>However, it is not expected in practice due to memory constraints and data privacy issues.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Enhancing Continual Relation Extraction via Classifier Decomposition [30.88081408988638]
Continual relation extraction models aim at handling emerging new relations while avoiding forgetting old ones in the streaming data.
Most models only adopt a vanilla strategy when models first learn representations of new relations.
We propose a simple yet effective classifier decomposition framework that splits the last FFN layer into separated previous and current classifiers.
arXiv Detail & Related papers (2023-05-08T11:29:33Z) - CILIATE: Towards Fairer Class-based Incremental Learning by Dataset and
Training Refinement [20.591583747291892]
We show that CIL suffers both dataset and algorithm bias problems.
We propose a novel framework, CILIATE, that fixes both dataset and algorithm bias in CIL.
CILIATE improves the fairness of CIL by 17.03%, 22.46%, and 31.79% compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-04-09T12:10:39Z) - Class-Incremental Learning with Strong Pre-trained Models [97.84755144148535]
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes)
We explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes.
Our proposed method is robust and generalizes to all analyzed CIL settings.
arXiv Detail & Related papers (2022-04-07T17:58:07Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot
Learning [76.98364915566292]
A common practice is to train a model on the base set first and then transfer to novel classes through fine-tuning.
We propose to transfer partial knowledge by freezing or fine-tuning particular layer(s) in the base model.
We conduct extensive experiments on CUB and mini-ImageNet to demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-08T03:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.