Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery
- URL: http://arxiv.org/abs/2404.08995v4
- Date: Tue, 30 Apr 2024 07:13:18 GMT
- Title: Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery
- Authors: Ye Wang, Yaxiong Wang, Yujiao Wu, Bingchen Zhao, Xueming Qian,
- Abstract summary: Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data.
We propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes.
Our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset.
- Score: 23.359450657842686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset and 12x clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at https://github.com/xjtuYW/PNP.git.
Related papers
- On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods [15.524425102344784]
Learning to map the data samples to compact representations leads to the representation collapse problem.
Regularizing the distribution of data points over the clusters is the prevalent strategy to avoid this issue.
We show that a partial prototype collapse problem still exists in the DINO family of methods, that leads to significant redundancies in the prototypes.
arXiv Detail & Related papers (2024-10-17T22:06:34Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Boundary-Refined Prototype Generation: A General End-to-End Paradigm for Semi-Supervised Semantic Segmentation [23.00156170789867]
Semi-supervised semantic segmentation has attracted increasing attention in computer vision.
Current approaches isolate prototype generation from the main training framework.
We propose a novel end-to-end boundary-refined prototype generation (BRPG) method.
arXiv Detail & Related papers (2023-07-19T16:12:37Z) - Actively Supervised Clustering for Open Relation Extraction [42.114747195195655]
We present a novel setting, named actively supervised clustering for OpenRE.
The key to the setting is selecting which instances to label.
We propose a new strategy, which is applicable to dynamically discover clusters of unknown relations.
arXiv Detail & Related papers (2023-06-08T06:55:02Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - BMD: A General Class-balanced Multicentric Dynamic Prototype Strategy
for Source-free Domain Adaptation [74.93176783541332]
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data.
To make up for the absence of source data, most existing methods introduced feature prototype based pseudo-labeling strategies.
We propose a general class-Balanced Multicentric Dynamic prototype strategy for the SFDA task.
arXiv Detail & Related papers (2022-04-06T13:23:02Z) - Cluster Representatives Selection in Non-Metric Spaces for Nearest
Prototype Classification [4.176752121302988]
In this paper, we present CRS, a novel method for selecting a small yet representative subset of objects as a cluster prototype.
Memory and computationally efficient selection of representatives is enabled by leveraging the similarity graph representation of each cluster created by the NN-Descent algorithm.
CRS can be used in an arbitrary metric or non-metric space because of the graph-based approach, which requires only a pairwise similarity measure.
arXiv Detail & Related papers (2021-07-03T04:51:07Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.