Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
- URL: http://arxiv.org/abs/2507.06928v1
- Date: Wed, 09 Jul 2025 15:10:21 GMT
- Title: Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
- Authors: Qiyuan Dai, Hanzhuo Huang, Yu Wu, Sibei Yang,
- Abstract summary: Generalized Category Discovery (GCD) aims to recognize unlabeled images from known and novel classes by distinguishing novel classes from known ones.<n>Existing GCD methods rely on self-supervised vision transformers such as DINO for representation learning.<n>We introduce an adaptive part discovery and learning method, called APL, which generates consistent object parts and their correspondences across different similar images.
- Score: 19.401916634022218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalized Category Discovery (GCD) aims to recognize unlabeled images from known and novel classes by distinguishing novel classes from known ones, while also transferring knowledge from another set of labeled images with known classes. Existing GCD methods rely on self-supervised vision transformers such as DINO for representation learning. However, focusing solely on the global representation of the DINO CLS token introduces an inherent trade-off between discriminability and generalization. In this paper, we introduce an adaptive part discovery and learning method, called APL, which generates consistent object parts and their correspondences across different similar images using a set of shared learnable part queries and DINO part priors, without requiring any additional annotations. More importantly, we propose a novel all-min contrastive loss to learn discriminative yet generalizable part representation, which adaptively highlights discriminative object parts to distinguish similar categories for enhanced discriminability while simultaneously sharing other parts to facilitate knowledge transfer for improved generalization. Our APL can easily be incorporated into different GCD frameworks by replacing their CLS token feature with our part representations, showing significant enhancements on fine-grained datasets.
Related papers
- Beyond Class Tokens: LLM-guided Dominant Property Mining for Few-shot Classification [31.300989699856583]
Few-Shot Learning attempts to develop the generalization ability for recognizing novel classes using only a few images.<n>Recent CLIP-like methods based on contrastive language-image mitigate the issue by leveraging textual representation of the class name for unseen image discovery.<n>We propose a novel Few-Shot Learning method (BCT-CLIP) that explores textbfdominating properties via contrastive learning beyond simply using class tokens.
arXiv Detail & Related papers (2025-07-28T04:23:06Z) - Learning Part Knowledge to Facilitate Category Understanding for Fine-Grained Generalized Category Discovery [10.98097145569408]
Generalized Category Discovery (GCD) aims to classify unlabeled data containing both seen and novel categories.<n>We propose incorporating part knowledge to address fine-grained GCD, which introduces two key challenges.
arXiv Detail & Related papers (2025-03-21T01:37:51Z) - InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models [24.170351966913557]
We propose the InPK model, which infuses class-specific prior knowledge into the learnable tokens.<n>We also introduce a learnable text-to-vision projection layer to accommodate the text adjustments.<n>In experiments, InPK significantly outperforms state-of-the-art methods in multiple zero/few-shot image classification tasks.
arXiv Detail & Related papers (2025-02-27T05:33:18Z) - Discriminative Image Generation with Diffusion Models for Zero-Shot Learning [53.44301001173801]
We present DIG-ZSL, a novel Discriminative Image Generation framework for Zero-Shot Learning.<n>We learn a discriminative class token (DCT) for each unseen class under the guidance of a pre-trained category discrimination model (CDM)<n>In this paper, the extensive experiments and visualizations on four datasets show that our DIG-ZSL: (1) generates diverse and high-quality images, (2) outperforms previous state-of-the-art nonhuman-annotated semantic prototype-based methods by a large margin, and (3) achieves comparable or better performance than baselines that leverage human-annot
arXiv Detail & Related papers (2024-12-23T02:18:54Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Learning What Not to Segment: A New Perspective on Few-Shot Segmentation [63.910211095033596]
Recently few-shot segmentation (FSS) has been extensively developed.
This paper proposes a fresh and straightforward insight to alleviate the problem.
In light of the unique nature of the proposed approach, we also extend it to a more realistic but challenging setting.
arXiv Detail & Related papers (2022-03-15T03:08:27Z) - Learning Aligned Cross-Modal Representation for Generalized Zero-Shot
Classification [17.177622259867515]
We propose an innovative autoencoder network by learning Aligned Cross-Modal Representations (dubbed ACMR) for Generalized Zero-Shot Classification (GZSC)
Specifically, we propose a novel Vision-Semantic Alignment (VSA) method to strengthen the alignment of cross-modal latent features on the latent subspaces guided by a learned classifier.
In addition, we propose a novel Information Enhancement Module (IEM) to reduce the possibility of latent variables collapse meanwhile encouraging the discriminative ability of latent variables.
arXiv Detail & Related papers (2021-12-24T03:35:37Z) - Open-Set Representation Learning through Combinatorial Embedding [62.05670732352456]
We are interested in identifying novel concepts in a dataset through representation learning based on the examples in both labeled and unlabeled classes.
We propose a learning approach, which naturally clusters examples in unseen classes using the compositional knowledge given by multiple supervised meta-classifiers on heterogeneous label spaces.
The proposed algorithm discovers novel concepts via a joint optimization of enhancing the discrimitiveness of unseen classes as well as learning the representations of known classes generalizable to novel ones.
arXiv Detail & Related papers (2021-06-29T11:51:57Z) - Task-Independent Knowledge Makes for Transferable Representations for
Generalized Zero-Shot Learning [77.0715029826957]
Generalized Zero-Shot Learning (GZSL) targets recognizing new categories by learning transferable image representations.
We propose a novel Dual-Contrastive Embedding Network (DCEN) that simultaneously learns task-specific and task-independent knowledge.
arXiv Detail & Related papers (2021-04-05T10:05:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.