Related papers: DCSI -- An improved measure of cluster separability based on separation and connectedness

DCSI -- An improved measure of cluster separability based on separation and connectedness

URL: http://arxiv.org/abs/2310.12806v2
Date: Mon, 1 Jul 2024 14:04:12 GMT
Title: DCSI -- An improved measure of cluster separability based on separation and connectedness
Authors: Jana Gauss, Fabian Scheipl, Moritz Herrmann,
Abstract summary: Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of separability for density-based clustering are between-class separation and within-class connectedness, and neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate them. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted Rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not correspond to meaningful density-based clusters.

Related papers

Hierarchical clustering with maximum density paths and mixture models [44.443538161979056]
t-NEB is a probabilistically grounded hierarchical clustering method.<n>It yields state-of-the-art clustering performance on naturalistic high-dimensional data.
arXiv Detail & Related papers (2025-03-19T15:37:51Z)
Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space. Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z)
SHADE: Deep Density-based Clustering [13.629470968274]
SHADE is the first deep clustering algorithm that incorporates density-connectivity into its loss function. It supports high-dimensional and large data sets with the expressive power of a deep autoencoder. It outperforms existing methods in clustering quality, especially on data that contain non-Gaussian clusters.
arXiv Detail & Related papers (2024-10-08T18:03:35Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection. We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
Very Compact Clusters with Structural Regularization via Similarity and Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets. Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry [9.619814126465206]
Clustering algorithms partition a dataset into groups of similar points. The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm. We show that incorporating spatial regularization into a multiscale clustering framework corresponds to smoother and more coherent clusters when applied to HSI data.
arXiv Detail & Related papers (2021-03-29T17:24:28Z)
Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain. We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one. Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Supervised Enhanced Soft Subspace Clustering (SESSC) for TSK Fuzzy Classifiers [25.32478253796209]
Fuzzy c-means based clustering algorithms are frequently used for Takagi-Sugeno-Kang (TSK) fuzzy classifier parameter estimation. This paper proposes a supervised enhanced soft subspace clustering (SESSC) algorithm, which considers simultaneously the within-cluster compactness, between-cluster separation, and label information in clustering.
arXiv Detail & Related papers (2020-02-27T19:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.