Enhancing cluster analysis via topological manifold learning
- URL: http://arxiv.org/abs/2207.00510v1
- Date: Fri, 1 Jul 2022 15:53:39 GMT
- Title: Enhancing cluster analysis via topological manifold learning
- Authors: Moritz Herrmann, Daniyal Kazempour, Fabian Scheipl, Peer Kr\"oger
- Abstract summary: We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
- Score: 0.3823356975862006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We discuss topological aspects of cluster analysis and show that inferring
the topological structure of a dataset before clustering it can considerably
enhance cluster detection: theoretical arguments and empirical evidence show
that clustering embedding vectors, representing the structure of a data
manifold instead of the observed feature vectors themselves, is highly
beneficial. To demonstrate, we combine manifold learning method UMAP for
inferring the topological structure with density-based clustering method
DBSCAN. Synthetic and real data results show that this both simplifies and
improves clustering in a diverse set of low- and high-dimensional problems
including clusters of varying density and/or entangled shapes. Our approach
simplifies clustering because topological pre-processing consistently reduces
parameter sensitivity of DBSCAN. Clustering the resulting embeddings with
DBSCAN can then even outperform complex methods such as SPECTACL and
ClusterGAN. Finally, our investigation suggests that the crucial issue in
clustering does not appear to be the nominal dimension of the data or how many
irrelevant features it contains, but rather how \textit{separable} the clusters
are in the ambient observation space they are embedded in, which is usually the
(high-dimensional) Euclidean space defined by the features of the data. Our
approach is successful because we perform the cluster analysis after projecting
the data into a more suitable space that is optimized for separability, in some
sense.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM [0.8452349885923507]
We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM)
GHSOM is applied to cluster samples in a hierarchical structure such that the self-growth structure of clusters satisfies the required variations between and within.
We present two innovative visualization tools: Cluster Feature Map and Cluster Distribution Map.
arXiv Detail & Related papers (2024-07-24T04:01:09Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - flow-based clustering and spectral clustering: a comparison [0.688204255655161]
We study a novel graph clustering method for data with an intrinsic network structure.
We exploit an intrinsic network structure of data to construct Euclidean feature vectors.
Our results indicate that our clustering methods can cope with certain graph structures.
arXiv Detail & Related papers (2022-06-20T21:49:52Z) - Swarm Intelligence for Self-Organized Clustering [6.85316573653194]
A swarm system called Databionic swarm (DBS) is introduced which is able to adapt itself to structures of high-dimensional data.
By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering.
arXiv Detail & Related papers (2021-06-10T06:21:48Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Skeleton Clustering: Dimension-Free Density-based Clustering [0.2538209532048866]
We introduce a density-based clustering method called skeleton clustering.
To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations.
arXiv Detail & Related papers (2021-04-21T21:25:02Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.