Deep Distribution-preserving Incomplete Clustering with Optimal
Transport
- URL: http://arxiv.org/abs/2103.11424v1
- Date: Sun, 21 Mar 2021 15:43:17 GMT
- Title: Deep Distribution-preserving Incomplete Clustering with Optimal
Transport
- Authors: Mingjie Luo, Siwei Wang, Xinwang Liu, Wenxuan Tu, Yi Zhang, Xifeng
Guo, Sihang Zhou and En Zhu
- Abstract summary: We propose a novel deep incomplete clustering method, named Deep Distribution-preserving Incomplete Clustering with Optimal Transport (DDIC-OT)
The proposed network achieves superior and stable clustering performance improvement against existing state-of-the-art incomplete clustering methods over different missing ratios.
- Score: 43.0056459311929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is a fundamental task in the computer vision and machine learning
community. Although various methods have been proposed, the performance of
existing approaches drops dramatically when handling incomplete
high-dimensional data (which is common in real world applications). To solve
the problem, we propose a novel deep incomplete clustering method, named Deep
Distribution-preserving Incomplete Clustering with Optimal Transport (DDIC-OT).
To avoid insufficient sample utilization in existing methods limited by few
fully-observed samples, we propose to measure distribution distance with the
optimal transport for reconstruction evaluation instead of traditional
pixel-wise loss function. Moreover, the clustering loss of the latent feature
is introduced to regularize the embedding with more discrimination capability.
As a consequence, the network becomes more robust against missing features and
the unified framework which combines clustering and sample imputation enables
the two procedures to negotiate to better serve for each other. Extensive
experiments demonstrate that the proposed network achieves superior and stable
clustering performance improvement against existing state-of-the-art incomplete
clustering methods over different missing ratios.
Related papers
- Stable Cluster Discrimination for Deep Clustering [7.175082696240088]
Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution.
The coupled objective implies a trivial solution that all instances collapse to the uniform features.
In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering.
A novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly.
arXiv Detail & Related papers (2023-11-24T06:43:26Z) - Enhancing Clustering Representations with Positive Proximity and Cluster
Dispersion Learning [9.396177578282176]
We propose a novel end-to-end deep clustering approach named PIPCDR.
PIPCDR incorporates a positive instance proximity loss and a cluster dispersion regularizer.
We extensively validate the effectiveness of PIPCDR within an end-to-end Majorize-Minimization framework.
arXiv Detail & Related papers (2023-11-01T06:12:02Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Neural Capacitated Clustering [6.155158115218501]
We propose a new method for the Capacitated Clustering Problem (CCP) that learns a neural network to predict the assignment probabilities of points to cluster centers.
In our experiments on artificial data and two real world datasets our approach outperforms several state-of-the-art mathematical and solvers from the literature.
arXiv Detail & Related papers (2023-02-10T09:33:44Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Task Agnostic and Post-hoc Unseen Distribution Detection [27.69612483621752]
We propose a task-agnostic and post-hoc Unseen Distribution Detection (TAPUDD) method.
It comprises of TAP-Mahalanobis, which clusters the training datasets' features and determines the minimum Mahalanobis distance of the test sample from all clusters.
We show that our method can detect unseen samples effectively across diverse tasks and performs better or on-par with the existing baselines.
arXiv Detail & Related papers (2022-07-26T17:55:15Z) - Deep Multi-View Semi-Supervised Clustering with Sample Pairwise
Constraints [10.226754903113164]
We propose a novel Deep Multi-view Semi-supervised Clustering (DMSC) method, which jointly optimize three kinds of losses during networks finetuning.
We demonstrate that our proposed approach performs better than the state-of-the-art multi-view and single-view competitors.
arXiv Detail & Related papers (2022-06-10T08:51:56Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Hybrid Dynamic Contrast and Probability Distillation for Unsupervised
Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system.
We present the hybrid dynamic cluster contrast and probability distillation algorithm.
It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.