VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
- URL: http://arxiv.org/abs/2512.10262v1
- Date: Thu, 11 Dec 2025 03:53:50 GMT
- Title: VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
- Authors: Yuetong Su, Baoguo Wei, Xinyu Wang, Xu Li, Lixin Li,
- Abstract summary: Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data.<n>We propose a multimodal framework that breaks this bottleneck by fusing visual-textual semantics and prototype guided clustering.<n>Our method shows unique resilience to long tail distributions, a first in NCD literature.
- Score: 8.280120179892885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel Class Discovery aims to utilise prior knowledge of known classes to classify and discover unknown classes from unlabelled data. Existing NCD methods for images primarily rely on visual features, which suffer from limitations such as insufficient feature discriminability and the long-tail distribution of data. We propose LLM-NCD, a multimodal framework that breaks this bottleneck by fusing visual-textual semantics and prototype guided clustering. Our key innovation lies in modelling cluster centres and semantic prototypes of known classes by jointly optimising known class image and text features, and a dualphase discovery mechanism that dynamically separates known or novel samples via semantic affinity thresholds and adaptive clustering. Experiments on the CIFAR-100 dataset show that compared to the current methods, this method achieves up to 25.3% improvement in accuracy for unknown classes. Notably, our method shows unique resilience to long tail distributions, a first in NCD literature.
Related papers
- No Labels Needed: Zero-Shot Image Classification with Collaborative Self-Learning [0.0]
Vision-language models (VLMs) and transfer learning with pre-trained visual models appear as promising techniques to deal with this problem.<n>This paper proposes a novel zero-shot image classification framework that combines a VLM and a pre-trained visual model within a self-learning cycle.
arXiv Detail & Related papers (2025-09-23T12:54:52Z) - Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery [52.616615506638205]
Novel class discovery (NCD) aims to cluster novel classes by leveraging knowledge from disjoint known classes.<n>We propose a novel framework named Intra-view and Inter-view Correlation Guided Multi-view Novel Class Discovery (IICMVNCD)<n>IICMVNCD is the first attempt to explore NCD in multi-view setting so far.
arXiv Detail & Related papers (2025-07-16T08:42:52Z) - FeNeC: Enhancing Continual Learning via Feature Clustering with Neighbor- or Logit-Based Classification [6.720605329045581]
We introduce FeNeC (Feature Neighborhood) and FeNeC-Log, incorporating its variant based on the log-likelihood function.<n>Our approach generalizes the existing concept by clustering to capture greater intra-class variability.<n>We demonstrate that two FeNeC variants achieve competitive performance in scenarios where task identities are unknown.
arXiv Detail & Related papers (2025-03-18T14:42:38Z) - Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification [10.667645628712542]
Whole Slide Image (WSI) classification has very significant applications in clinical pathology.<n>This paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification.
arXiv Detail & Related papers (2024-10-14T14:49:34Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning [13.68867780184022]
Few-shot learning aims to recognize new concepts using a limited number of visual samples.
Our framework incorporates both the abstract class semantics and the concrete class entities extracted from Large Language Models (LLMs)
For the challenging one-shot setting, our approach, utilizing the ResNet-12 backbone, achieves an average improvement of 1.95% over the second-best competitor.
arXiv Detail & Related papers (2024-08-22T15:10:20Z) - Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery [23.359450657842686]
Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data.
We propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes.
Our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset.
arXiv Detail & Related papers (2024-04-13T12:41:40Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Automatically Discovering Novel Visual Categories with Self-supervised
Prototype Learning [68.63910949916209]
This paper tackles the problem of novel category discovery (NCD), which aims to discriminate unknown categories in large-scale image collections.
We propose a novel adaptive prototype learning method consisting of two main stages: prototypical representation learning and prototypical self-training.
We conduct extensive experiments on four benchmark datasets and demonstrate the effectiveness and robustness of the proposed method with state-of-the-art performance.
arXiv Detail & Related papers (2022-08-01T16:34:33Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.