Related papers: Generalized Category Discovery under the Long-Tailed Distribution

Generalized Category Discovery under the Long-Tailed Distribution

URL: http://arxiv.org/abs/2506.12515v2
Date: Fri, 20 Jun 2025 10:11:08 GMT
Title: Generalized Category Discovery under the Long-Tailed Distribution
Authors: Bingchen Zhao, Kai Han,
Abstract summary: This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution.<n>We propose a framework based on confident sample selection and density-based clustering to tackle these challenges.<n>Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.
Score: 19.597592179538257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution, which involves discovering novel categories in an unlabelled dataset using knowledge from a set of labelled categories. Existing works assume a uniform distribution for both datasets, but real-world data often exhibits a long-tailed distribution, where a few categories contain most examples, while others have only a few. While the long-tailed distribution is well-studied in supervised and semi-supervised settings, it remains unexplored in the GCD context. We identify two challenges in this setting - balancing classifier learning and estimating category numbers - and propose a framework based on confident sample selection and density-based clustering to tackle them. Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.

Related papers

Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery [64.83837781610907]
We investigate the online identification of newly arriving stream data that may belong to both known and unknown categories.<n>Existing OCD methods are devoted to fully mining transferable knowledge from only labeled data.<n>We propose a diffusion-based OCD framework, dubbed DiffGRE, which integrates attribute-composition generation, Refinement, and supervised recognition.
arXiv Detail & Related papers (2025-07-05T14:20:49Z)
Generalized Category Discovery in Event-Centric Contexts: Latent Pattern Mining with LLMs [34.06878654462158]
We introduce Event-Centric GCD, characterized by long, complex narratives and highly imbalanced class distributions.<n>We propose PaMA, a framework leveraging LLMs to extract and refine event patterns for improved cluster-class alignment.<n> Evaluations on two EC-GCD benchmarks, including a newly constructed Scam Report dataset, demonstrate that PaMA outperforms prior methods with up to 12.58% H-score gains.
arXiv Detail & Related papers (2025-05-29T10:02:04Z)
Generalized Class Discovery in Instance Segmentation [7.400926717561454]
We propose an instance-wise temperature assignment (ITA) method for contrastive learning and class-wise reliability criteria for pseudo-labels.<n>We evaluate our proposed method by conducting experiments on two settings: COCO$_half$ + LVIS and LVIS + Visual Genome.
arXiv Detail & Related papers (2025-02-12T06:26:05Z)
Generalized Categories Discovery for Long-tailed Recognition [8.69033435074757]
Generalized Class Discovery plays a pivotal role in discerning both known and unknown categories from unlabeled datasets. Our research endeavors to bridge this disconnect by focusing on the long-tailed Generalized Category Discovery (Long-tailed GCD) paradigm. In response to the unique challenges posed by Long-tailed GCD, we present a robust methodology anchored in two strategic regularizations.
arXiv Detail & Related papers (2023-12-04T09:21:30Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
Towards Distribution-Agnostic Generalized Category Discovery [51.52673017664908]
Data imbalance and open-ended distribution are intrinsic characteristics of the real visual world. We propose a Self-Balanced Co-Advice contrastive framework (BaCon) BaCon consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task.
arXiv Detail & Related papers (2023-10-02T17:39:58Z)
Concept Drift and Long-Tailed Distribution in Fine-Grained Visual Categorization: Benchmark and Method [84.68818879525568]
We present a Concept Drift and Long-Tailed Distribution dataset. The characteristics of instances tend to vary with time and exhibit a long-tailed distribution. We propose a feature recombination framework to address the learning challenges associated with CDLT.
arXiv Detail & Related papers (2023-06-04T12:42:45Z)
Tackling Long-Tailed Category Distribution Under Domain Shifts [50.21255304847395]
Existing approaches cannot handle the scenario where both issues exist. We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation. Two new datasets were proposed for this problem, named AWA2-LTS and ImageNet-LTS.
arXiv Detail & Related papers (2022-07-20T19:07:46Z)
Learning Muti-expert Distribution Calibration for Long-tailed Video Classification [88.12433458277168]
We propose an end-to-end multi-experts distribution calibration method based on two-level distribution information. By modeling this two-level distribution information, the model can consider the head classes and the tail classes. Our method achieves state-of-the-art performance on the long-tailed video classification task.
arXiv Detail & Related papers (2022-05-22T09:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.