ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery
- URL: http://arxiv.org/abs/2511.22892v1
- Date: Fri, 28 Nov 2025 05:42:21 GMT
- Title: ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery
- Authors: Kailin Lyu, Jianwei He, Long Xiao, Jianing Zeng, Liang Fan, Lin Shu, Jie Hao,
- Abstract summary: In open-world scenarios, Generalized Category Discovery (GCD) requires identifying both known and novel categories within unlabeled data.<n>We propose ClearGCD, a framework designed to mitigate reliance on non-semantic cues through two complementary mechanisms.<n>First, Semantic View Alignment (SVA) generates strong augmentations via cross-class patch replacement and enforces semantic consistency using weak augmentations.<n>Second, Shortcut Suppression Regularization (SSR) maintains an adaptive prototype bank that aligns known classes while encouraging separation of potential novel ones.
- Score: 6.219469635654406
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In open-world scenarios, Generalized Category Discovery (GCD) requires identifying both known and novel categories within unlabeled data. However, existing methods often suffer from prototype confusion caused by shortcut learning, which undermines generalization and leads to forgetting of known classes. We propose ClearGCD, a framework designed to mitigate reliance on non-semantic cues through two complementary mechanisms. First, Semantic View Alignment (SVA) generates strong augmentations via cross-class patch replacement and enforces semantic consistency using weak augmentations. Second, Shortcut Suppression Regularization (SSR) maintains an adaptive prototype bank that aligns known classes while encouraging separation of potential novel ones. ClearGCD can be seamlessly integrated into parametric GCD approaches and consistently outperforms state-of-the-art methods across multiple benchmarks.
Related papers
- Consistent Supervised-Unsupervised Alignment for Generalized Category Discovery [49.67913741459179]
Generalized Category Discovery (GCD) focuses on classifying known categories while simultaneously discovering novel categories from unlabeled data.<n>Previous GCD methods face challenges due to inconsistent optimization objectives and category confusion.<n>We propose the Neural Collapse-inspired Generalized Category Discovery (NC-GCD) framework.
arXiv Detail & Related papers (2025-07-07T07:34:41Z) - DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery [14.222473509173357]
We tackle the problem of Generalized Category Discovery (GCD)<n>In GCD, an inherent label bias exists between known and unknown classes due to the lack of ground-truth labels for the latter.<n>We introduce DebGCD, a underlineDebiased learning with distribution guidance framework for underlineGCD.
arXiv Detail & Related papers (2025-04-07T07:56:01Z) - ProtoGCD: Unified and Unbiased Prototype Learning for Generalized Category Discovery [42.965641047139904]
Generalized category discovery (GCD) is a pragmatic but underexplored problem.<n>Unlabeled data contain both old and new classes.<n>ProtoGCD achieves state-of-the-art performance on both generic and fine-grained datasets.
arXiv Detail & Related papers (2025-04-02T06:13:14Z) - Solving the Catastrophic Forgetting Problem in Generalized Category Discovery [46.63232918739251]
Generalized Category Discovery (GCD) aims to identify a mix of known and novel categories within unlabeled data sets.<n>Recent state-of-the-art method SimGCD transfers the knowledge from known-class data to the learning of novel classes through debiased learning.<n>We propose a novel learning approach, LegoGCD, which is seamlessly integrated into previous methods to enhance the discrimination of novel classes.
arXiv Detail & Related papers (2025-01-09T14:31:54Z) - Generalized Categories Discovery for Long-tailed Recognition [8.69033435074757]
Generalized Class Discovery plays a pivotal role in discerning both known and unknown categories from unlabeled datasets.
Our research endeavors to bridge this disconnect by focusing on the long-tailed Generalized Category Discovery (Long-tailed GCD) paradigm.
In response to the unique challenges posed by Long-tailed GCD, we present a robust methodology anchored in two strategic regularizations.
arXiv Detail & Related papers (2023-12-04T09:21:30Z) - Activate and Reject: Towards Safe Domain Generalization under Category
Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS)
It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains.
Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.