Related papers: Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters

Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters

URL: http://arxiv.org/abs/2410.09491v1
Date: Sat, 12 Oct 2024 11:04:10 GMT
Title: Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters
Authors: Collin Leiber, Niklas Strauß, Matthias Schubert, Thomas Seidl,
Abstract summary: Finding meaningful groups in high-dimensional data is an important challenge in data mining. Deep clustering methods have achieved remarkable results in these tasks. Most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Most of these approaches estimate the number of clusters separated from the clustering process.
Score: 5.507296054825372
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting from a given upper bound, is able to estimate the number of clusters. To the best of our knowledge, it is the first method that can be easily combined with various deep clustering algorithms. We demonstrate the applicability of our approach by combining UNSEEN with the popular deep clustering algorithms DCN, DEC, and DKM and verify its effectiveness through an extensive experimental evaluation on several image and tabular datasets. Moreover, we perform numerous ablations to analyze our approach and show the importance of its components. The code is available at: https://github.com/collinleiber/UNSEEN

Related papers

Depth-Based Local Center Clustering: A Framework for Handling Different Clustering Scenarios [46.164361878412656]
Cluster analysis plays a crucial role across numerous scientific and engineering domains.<n>Despite the wealth of clustering methods proposed over the past decades, each method is typically designed for specific scenarios.<n>In this paper, we propose depth-based clustering (DLCC)<n>DLCC makes use of a local version of data depth that is based on subsets of data
arXiv Detail & Related papers (2025-05-14T16:08:11Z)
Guaranteed Recovery of Unambiguous Clusters [7.011239860967789]
Clustering is often a challenging problem because of the inherent ambiguity in what the "correct" clustering should be. In this paper we propose an information-theoretic characterization and design an algorithm that recovers the clustering whenever it is unambiguous.
arXiv Detail & Related papers (2025-01-22T18:51:25Z)
ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items. The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z)
UniForCE: The Unimodality Forest Method for Clustering and Estimation of the Number of Clusters [2.4953699842881605]
We focus on the concept of unimodality and propose a flexible cluster definition called locally unimodal cluster. A locally unimodal cluster extends for as long as unimodality is locally preserved across pairs of subclusters of the data. We propose the UniForCE method for locally unimodal clustering.
arXiv Detail & Related papers (2023-12-18T16:19:02Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach. It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks. Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z)
Analysis of Sparse Subspace Clustering: Experiments and Random Projection [0.0]
Clustering is a technique that is used in many domains, such as face clustering, plant categorization, image segmentation, document classification. We analyze one of these techniques: a powerful clustering algorithm called Sparse Subspace Clustering. We demonstrate several experiments using this method and then introduce a new approach that can reduce the computational time required to perform sparse subspace clustering.
arXiv Detail & Related papers (2022-04-01T23:55:53Z)
Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data. Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Spectral Clustering with Smooth Tiny Clusters [14.483043753721256]
We propose a novel clustering algorithm, which con-siders the smoothness of data for the first time. Our key idea is to cluster tiny clusters, whose centers constitute smooth graphs. Although in this paper, we singly focus on multi-scale situations, the idea of data smoothness can certainly be extended to any clustering algorithms.
arXiv Detail & Related papers (2020-09-10T05:21:20Z)
Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems. They converge to different settings with different initial conditions. The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.