Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures
- URL: http://arxiv.org/abs/2403.14830v1
- Date: Thu, 21 Mar 2024 20:43:44 GMT
- Title: Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures
- Authors: Zeya Wang, Chenglong Ye,
- Abstract summary: Deep clustering is a method for partitioning complex, high-dimensional data using deep neural networks.
Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering.
This paper addresses these challenges in evaluating clustering quality in deep learning.
- Score: 2.2252684361733284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic approach to applying clustering validity indices in deep clustering contexts. Experiments show that this framework aligns better with external validation measures, effectively reducing the misguidance from the improper use of clustering validity indices in deep learning.
Related papers
- From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms.
We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Dirichlet Process-based Robust Clustering using the Median-of-Means Estimator [16.774378814288806]
We propose an efficient and automatic clustering technique by integrating the strengths of model-based and centroid-based methodologies.
Our method mitigates the effect of noise on the quality of clustering; while at the same time, estimates the number of clusters.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning [0.0]
This paper develops one of the first deep clustering methods for tabular data: Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS)
The G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen data sets, respectively, and outperforms nine state-of-the-art clustering methods.
arXiv Detail & Related papers (2023-01-02T18:45:53Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data.
We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z) - Learning Robust Representation for Clustering through Locality
Preserving Variational Discriminative Network [16.259673823482665]
Variational Deep Embedding achieves great success in various clustering tasks.
VaDE suffers from two problems: 1) it is fragile to the input noise; 2) it ignores the locality information between the neighboring data points.
We propose a joint learning framework that improves VaDE with a robust embedding discriminator and a local structure constraint.
arXiv Detail & Related papers (2020-12-25T02:31:55Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.