Stable and consistent density-based clustering via multiparameter
persistence
- URL: http://arxiv.org/abs/2005.09048v3
- Date: Thu, 3 Aug 2023 08:10:24 GMT
- Title: Stable and consistent density-based clustering via multiparameter
persistence
- Authors: Alexander Rolle, Luis Scoccola
- Abstract summary: We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
- Score: 77.34726150561087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the degree-Rips construction from topological data analysis,
which provides a density-sensitive, multiparameter hierarchical clustering
algorithm. We analyze its stability to perturbations of the input data using
the correspondence-interleaving distance, a metric for hierarchical clusterings
that we introduce. Taking certain one-parameter slices of degree-Rips recovers
well-known methods for density-based clustering, but we show that these methods
are unstable. However, we prove that degree-Rips, as a multiparameter object,
is stable, and we propose an alternative approach for taking slices of
degree-Rips, which yields a one-parameter hierarchical clustering algorithm
with better stability properties. We prove that this algorithm is consistent,
using the correspondence-interleaving distance. We provide an algorithm for
extracting a single clustering from one-parameter hierarchical clusterings,
which is stable with respect to the correspondence-interleaving distance. And,
we integrate these methods into a pipeline for density-based clustering, which
we call Persistable. Adapting tools from multiparameter persistent homology, we
propose visualization tools that guide the selection of all parameters of the
pipeline. We demonstrate Persistable on benchmark datasets, showing that it
identifies multi-scale cluster structure in data.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Fast conformational clustering of extensive molecular dynamics
simulation data [19.444636864515726]
We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long trajectories.
We combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (HDBSCAN)
With the help of four test systems we illustrate the capability and performance of this clustering workflow.
arXiv Detail & Related papers (2023-01-11T14:36:43Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Quantile-based fuzzy C-means clustering of multivariate time series:
Robust techniques [2.3226893628361682]
Robustness to the presence of outliers is achieved by using the so-called metric, noise and trimmed approaches.
Results from a broad simulation study indicate that the algorithms are substantially effective in coping with the presence of outlying series.
arXiv Detail & Related papers (2021-09-22T20:26:12Z) - Fuzzy clustering algorithms with distance metric learning and entropy
regularization [0.0]
This paper proposes fuzzy clustering algorithms based on Euclidean, City-block and Mahalanobis distances and entropy regularization.
Several experiments on synthetic and real datasets, including its application to noisy image texture segmentation, demonstrate the usefulness of these adaptive clustering methods.
arXiv Detail & Related papers (2021-02-18T18:19:04Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.