How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning
- URL: http://arxiv.org/abs/2602.05749v1
- Date: Thu, 05 Feb 2026 15:16:04 GMT
- Title: How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning
- Authors: Kai Ming Ting, Wei-Jie Xu, Hang Zhang,
- Abstract summary: Deep Embedded Clustering learns a latent representation via an autoencoder and performs clustering based on a $k$-means-like procedure.<n>This paper investigates whether the deep-learned representation has enabled DEC to overcome the known fundamental limitations of $k$-means clustering.
- Score: 9.022973688786545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep clustering (DC) is often quoted to have a key advantage over $k$-means clustering. Yet, this advantage is often demonstrated using image datasets only, and it is unclear whether it addresses the fundamental limitations of $k$-means clustering. Deep Embedded Clustering (DEC) learns a latent representation via an autoencoder and performs clustering based on a $k$-means-like procedure, while the optimization is conducted in an end-to-end manner. This paper investigates whether the deep-learned representation has enabled DEC to overcome the known fundamental limitations of $k$-means clustering, i.e., its inability to discover clusters of arbitrary shapes, varied sizes and densities. Our investigations on DEC have a wider implication on deep clustering methods in general. Notably, none of these methods exploit the underlying data distribution. We uncover that a non-deep learning approach achieves the intended aim of deep clustering by making use of distributional information of clusters in a dataset to effectively address these fundamental limitations.
Related papers
- K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z) - Depth-Based Local Center Clustering: A Framework for Handling Different Clustering Scenarios [46.164361878412656]
Cluster analysis plays a crucial role across numerous scientific and engineering domains.<n>Despite the wealth of clustering methods proposed over the past decades, each method is typically designed for specific scenarios.<n>In this paper, we propose depth-based clustering (DLCC)<n>DLCC makes use of a local version of data depth that is based on subsets of data
arXiv Detail & Related papers (2025-05-14T16:08:11Z) - Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters [5.507296054825372]
Finding meaningful groups in high-dimensional data is an important challenge in data mining.
Deep clustering methods have achieved remarkable results in these tasks.
Most of these methods require the user to specify the number of clusters in advance.
This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable.
Most of these approaches estimate the number of clusters separated from the clustering process.
arXiv Detail & Related papers (2024-10-12T11:04:10Z) - Stable Cluster Discrimination for Deep Clustering [7.175082696240088]
Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution.
The coupled objective implies a trivial solution that all instances collapse to the uniform features.
In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering.
A novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly.
arXiv Detail & Related papers (2023-11-24T06:43:26Z) - DivClust: Controlling Diversity in Deep Clustering [47.85350249697335]
DivClust produces consensus clustering solutions that consistently outperform single-clustering baselines.
Our method effectively controls diversity across frameworks and datasets with very small additional computational cost.
arXiv Detail & Related papers (2023-04-03T14:45:43Z) - Asymptotics for The $k$-means [0.6091702876917281]
The $k$-means is one of the most important unsupervised learning techniques in statistics and computer science.
The proposed clustering consistency is more appropriate than the previous criterion consistency for the clustering methods.
It is found that the proposed $k$-means method has lower clustering error rates and is more robust to small clusters and outliers.
arXiv Detail & Related papers (2022-11-18T03:36:58Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.