P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced
Clustering
- URL: http://arxiv.org/abs/2401.09266v1
- Date: Wed, 17 Jan 2024 15:15:46 GMT
- Title: P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced
Clustering
- Authors: Chuyu Zhang, Hui Ren, Xuming He
- Abstract summary: We propose a novel pseudo-labeling-based learning framework for deep clustering.
Our framework generates imbalance-aware pseudo-labels and learning from high-confident samples.
Experiments on various datasets, including a human-curated long-tailed CIFAR100, demonstrate the superiority of our method.
- Score: 16.723646401890495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep clustering, which learns representation and semantic clustering without
labels information, poses a great challenge for deep learning-based approaches.
Despite significant progress in recent years, most existing methods focus on
uniformly distributed datasets, significantly limiting the practical
applicability of their methods. In this paper, we first introduce a more
practical problem setting named deep imbalanced clustering, where the
underlying classes exhibit an imbalance distribution. To tackle this problem,
we propose a novel pseudo-labeling-based learning framework. Our framework
formulates pseudo-label generation as a progressive partial optimal transport
problem, which progressively transports each sample to imbalanced clusters
under prior distribution constraints, thus generating imbalance-aware
pseudo-labels and learning from high-confident samples. In addition, we
transform the initial formulation into an unbalanced optimal transport problem
with augmented constraints, which can be solved efficiently by a fast matrix
scaling algorithm. Experiments on various datasets, including a human-curated
long-tailed CIFAR100, challenging ImageNet-R, and large-scale subsets of
fine-grained iNaturalist2018 datasets, demonstrate the superiority of our
method.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering [14.880015659013681]
We introduce a novel optimal transport-based pseudo-label learning framework.
Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport problem.
We employ the strategy of majorization to reformulate the SP$2$OT problem into a Progressive Partial Optimal Transport problem.
arXiv Detail & Related papers (2024-04-04T13:46:52Z) - A provable initialization and robust clustering method for general mixture models [6.806940901668607]
Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data.
Most recent results focus on optimal mislabeling guarantees when data are distributed around centroids with sub-Gaussian errors.
arXiv Detail & Related papers (2024-01-10T22:56:44Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Neural Collapse Terminus: A Unified Solution for Class Incremental
Learning and Its Variants [166.916517335816]
In this paper, we offer a unified solution to the misalignment dilemma in the three tasks.
We propose neural collapse terminus that is a fixed structure with the maximal equiangular inter-class separation for the whole label space.
Our method holds the neural collapse optimality in an incremental fashion regardless of data imbalance or data scarcity.
arXiv Detail & Related papers (2023-08-03T13:09:59Z) - Class-Imbalanced Complementary-Label Learning via Weighted Loss [8.934943507699131]
Complementary-label learning (CLL) is widely used in weakly supervised classification.
It faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples.
We propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification.
arXiv Detail & Related papers (2022-09-28T16:02:42Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Solving The Long-Tailed Problem via Intra- and Inter-Category Balance [17.04366558952357]
Benchmark datasets for visual recognition assume that data is uniformly distributed, while real-world datasets obey long-tailed distribution.
Current approaches handle the long-tailed problem to transform the long-tailed dataset to uniform distribution by re-sampling or re-weighting strategies.
We propose a novel gradient harmonized mechanism with category-wise adaptive precision to decouple the difficulty and sample size imbalance in the long-tailed problem.
arXiv Detail & Related papers (2022-04-20T05:36:20Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z) - Semi-Supervised Learning with Meta-Gradient [123.26748223837802]
We propose a simple yet effective meta-learning algorithm in semi-supervised learning.
We find that the proposed algorithm performs favorably against state-of-the-art methods.
arXiv Detail & Related papers (2020-07-08T08:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.