Generalized Category Discovery under the Long-Tailed Distribution
- URL: http://arxiv.org/abs/2506.12515v2
- Date: Fri, 20 Jun 2025 10:11:08 GMT
- Title: Generalized Category Discovery under the Long-Tailed Distribution
- Authors: Bingchen Zhao, Kai Han,
- Abstract summary: This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution.<n>We propose a framework based on confident sample selection and density-based clustering to tackle these challenges.<n>Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.
- Score: 19.597592179538257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution, which involves discovering novel categories in an unlabelled dataset using knowledge from a set of labelled categories. Existing works assume a uniform distribution for both datasets, but real-world data often exhibits a long-tailed distribution, where a few categories contain most examples, while others have only a few. While the long-tailed distribution is well-studied in supervised and semi-supervised settings, it remains unexplored in the GCD context. We identify two challenges in this setting - balancing classifier learning and estimating category numbers - and propose a framework based on confident sample selection and density-based clustering to tackle them. Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.
Related papers
- Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery [64.83837781610907]
We investigate the online identification of newly arriving stream data that may belong to both known and unknown categories.<n>Existing OCD methods are devoted to fully mining transferable knowledge from only labeled data.<n>We propose a diffusion-based OCD framework, dubbed DiffGRE, which integrates attribute-composition generation, Refinement, and supervised recognition.
arXiv Detail & Related papers (2025-07-05T14:20:49Z) - Generalized Category Discovery in Event-Centric Contexts: Latent Pattern Mining with LLMs [34.06878654462158]
We introduce Event-Centric GCD, characterized by long, complex narratives and highly imbalanced class distributions.<n>We propose PaMA, a framework leveraging LLMs to extract and refine event patterns for improved cluster-class alignment.<n> Evaluations on two EC-GCD benchmarks, including a newly constructed Scam Report dataset, demonstrate that PaMA outperforms prior methods with up to 12.58% H-score gains.
arXiv Detail & Related papers (2025-05-29T10:02:04Z) - Generalized Class Discovery in Instance Segmentation [7.400926717561454]
We propose an instance-wise temperature assignment (ITA) method for contrastive learning and class-wise reliability criteria for pseudo-labels.<n>We evaluate our proposed method by conducting experiments on two settings: COCO$_half$ + LVIS and LVIS + Visual Genome.
arXiv Detail & Related papers (2025-02-12T06:26:05Z) - Generalized Categories Discovery for Long-tailed Recognition [8.69033435074757]
Generalized Class Discovery plays a pivotal role in discerning both known and unknown categories from unlabeled datasets.
Our research endeavors to bridge this disconnect by focusing on the long-tailed Generalized Category Discovery (Long-tailed GCD) paradigm.
In response to the unique challenges posed by Long-tailed GCD, we present a robust methodology anchored in two strategic regularizations.
arXiv Detail & Related papers (2023-12-04T09:21:30Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Towards Distribution-Agnostic Generalized Category Discovery [51.52673017664908]
Data imbalance and open-ended distribution are intrinsic characteristics of the real visual world.
We propose a Self-Balanced Co-Advice contrastive framework (BaCon)
BaCon consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task.
arXiv Detail & Related papers (2023-10-02T17:39:58Z) - Concept Drift and Long-Tailed Distribution in Fine-Grained Visual Categorization: Benchmark and Method [84.68818879525568]
We present a Concept Drift and Long-Tailed Distribution dataset.
The characteristics of instances tend to vary with time and exhibit a long-tailed distribution.
We propose a feature recombination framework to address the learning challenges associated with CDLT.
arXiv Detail & Related papers (2023-06-04T12:42:45Z) - Tackling Long-Tailed Category Distribution Under Domain Shifts [50.21255304847395]
Existing approaches cannot handle the scenario where both issues exist.
We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation.
Two new datasets were proposed for this problem, named AWA2-LTS and ImageNet-LTS.
arXiv Detail & Related papers (2022-07-20T19:07:46Z) - Learning Muti-expert Distribution Calibration for Long-tailed Video
Classification [88.12433458277168]
We propose an end-to-end multi-experts distribution calibration method based on two-level distribution information.
By modeling this two-level distribution information, the model can consider the head classes and the tail classes.
Our method achieves state-of-the-art performance on the long-tailed video classification task.
arXiv Detail & Related papers (2022-05-22T09:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.