DeepDPM: Deep Clustering With an Unknown Number of Clusters
- URL: http://arxiv.org/abs/2203.14309v1
- Date: Sun, 27 Mar 2022 14:11:06 GMT
- Title: DeepDPM: Deep Clustering With an Unknown Number of Clusters
- Authors: Meitar Ronen, Shahaf E. Finder, Oren Freifeld
- Abstract summary: We introduce an effective deep-clustering method that does not require knowing the value of K as it infers it during the learning.
Using a split/merge framework, a dynamic architecture that adapts to the changing K, and a novel loss, our proposed method outperforms existing nonparametric methods.
- Score: 6.0803541683577444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning (DL) has shown great promise in the unsupervised task of
clustering. That said, while in classical (i.e., non-deep) clustering the
benefits of the nonparametric approach are well known, most deep-clustering
methods are parametric: namely, they require a predefined and fixed number of
clusters, denoted by K. When K is unknown, however, using model-selection
criteria to choose its optimal value might become computationally expensive,
especially in DL as the training process would have to be repeated numerous
times. In this work, we bridge this gap by introducing an effective
deep-clustering method that does not require knowing the value of K as it
infers it during the learning. Using a split/merge framework, a dynamic
architecture that adapts to the changing K, and a novel loss, our proposed
method outperforms existing nonparametric methods (both classical and deep
ones). While the very few existing deep nonparametric methods lack scalability,
we demonstrate ours by being the first to report the performance of such a
method on ImageNet. We also demonstrate the importance of inferring K by
showing how methods that fix it deteriorate in performance when their assumed K
value gets further from the ground-truth one, especially on imbalanced
datasets. Our code is available at https://github.com/BGU-CS-VIL/DeepDPM.
Related papers
- Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - A Deep Dive into Deep Cluster [0.2578242050187029]
DeepCluster is a simple and scalable unsupervised pretraining of visual representations.
We show that DeepCluster convergence and performance depend on the interplay between the quality of the randomly filters of the convolutional layer and the selected number of clusters.
arXiv Detail & Related papers (2022-07-24T22:55:09Z) - KnAC: an approach for enhancing cluster analysis with background
knowledge and explanations [0.20999222360659603]
We present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering.
KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic.
arXiv Detail & Related papers (2021-12-16T10:13:47Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - ThetA -- fast and robust clustering via a distance parameter [3.0020405188885815]
Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades.
We propose a new set of distance threshold methods called Theta-based Algorithms (ThetA)
arXiv Detail & Related papers (2021-02-13T23:16:33Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - An Efficient Framework for Clustered Federated Learning [26.24231986590374]
We address the problem of federated learning (FL) where users are distributed into clusters.
We propose the Iterative Federated Clustering Algorithm (IFCA)
We show that our algorithm is efficient in non- partitioned problems such as neural networks.
arXiv Detail & Related papers (2020-06-07T08:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.