Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model
- URL: http://arxiv.org/abs/2512.09441v1
- Date: Wed, 10 Dec 2025 09:09:23 GMT
- Title: Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model
- Authors: Jiantao Tan, Peixian Ma, Tong Yu, Wentao Zhang, Ruixuan Wang,
- Abstract summary: Class-incremental learning requires a learning system to continually learn knowledge of new classes.<n>Current methods based on Vision-Language Models (VLMs) still suffer from the issue of differentiating classes across learning tasks.<n>Here a novel VLM-based continual learning framework for image classification is proposed.
- Score: 22.04660341150286
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Class-incremental learning requires a learning system to continually learn knowledge of new classes and meanwhile try to preserve previously learned knowledge of old classes. As current state-of-the-art methods based on Vision-Language Models (VLMs) still suffer from the issue of differentiating classes across learning tasks. Here a novel VLM-based continual learning framework for image classification is proposed. In this framework, task-specific adapters are added to the pre-trained and frozen image encoder to learn new knowledge, and a novel cross-task representation calibration strategy based on a mixture of light-weight projectors is used to help better separate all learned classes in a unified feature space, alleviating class confusion across tasks. In addition, a novel inference strategy guided by prediction uncertainty is developed to more accurately select the most appropriate image feature for class prediction. Extensive experiments on multiple datasets under various settings demonstrate the superior performance of our method compared to existing ones.
Related papers
- Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models [4.499940819352075]
Large Vision-Language Models (LVLMs) have demonstrated impressive performance on vision-language reasoning tasks.<n>We present a novel method that transforms zero-shot fine-grained image classification into a visual question-answering framework.<n>Our proposed method consistently outperforms the current state-of-the-art (SOTA) approach.
arXiv Detail & Related papers (2025-10-04T18:56:41Z) - InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models [24.170351966913557]
We propose the InPK model, which infuses class-specific prior knowledge into the learnable tokens.<n>We also introduce a learnable text-to-vision projection layer to accommodate the text adjustments.<n>In experiments, InPK significantly outperforms state-of-the-art methods in multiple zero/few-shot image classification tasks.
arXiv Detail & Related papers (2025-02-27T05:33:18Z) - Towards Generative Class Prompt Learning for Fine-grained Visual Recognition [5.633314115420456]
Generative Class Prompt Learning and Contrastive Multi-class Prompt Learning are presented.
Generative Class Prompt Learning improves visio-linguistic synergy in class embeddings by conditioning on few-shot exemplars with learnable class prompts.
CoMPLe builds on this foundation by introducing a contrastive learning component that encourages inter-class separation.
arXiv Detail & Related papers (2024-09-03T12:34:21Z) - Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning [56.29097276129473]
We propose a simple yet effective framework, named Learning Prompt with Distribution-based Feature Replay (LP-DiF)<n>To prevent the learnable prompt from forgetting old knowledge in the new session, we propose a pseudo-feature replay approach.<n>When progressing to a new session, pseudo-features are sampled from old-class distributions combined with training images of the current session to optimize the prompt.
arXiv Detail & Related papers (2024-01-03T07:59:17Z) - Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation.
Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z) - Incremental Object Detection with CLIP [36.478530086163744]
We propose a visual-language model such as CLIP to generate text feature embeddings for different class sets.
We then employ super-classes to replace the unavailable novel classes in the early learning stage to simulate the incremental scenario.
We incorporate the finely recognized detection boxes as pseudo-annotations into the training process, thereby further improving the detection performance.
arXiv Detail & Related papers (2023-10-13T01:59:39Z) - Class-Incremental Learning: A Survey [84.30083092434938]
Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally.
CIL tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades.
We provide a rigorous and unified evaluation of 17 methods in benchmark image classification tasks to find out the characteristics of different algorithms.
arXiv Detail & Related papers (2023-02-07T17:59:05Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - Class-Balanced Distillation for Long-Tailed Visual Recognition [100.10293372607222]
Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions.
In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting.
Our main contribution is a new training method, that leverages knowledge distillation to enhance feature representations.
arXiv Detail & Related papers (2021-04-12T08:21:03Z) - Few-Shot Class-Incremental Learning [68.75462849428196]
We focus on a challenging but practical few-shot class-incremental learning (FSCIL) problem.
FSCIL requires CNN models to incrementally learn new classes from very few labelled samples, without forgetting the previously learned ones.
We represent the knowledge using a neural gas (NG) network, which can learn and preserve the topology of the feature manifold formed by different classes.
arXiv Detail & Related papers (2020-04-23T03:38:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.