Persistent Homology of the Multiscale Clustering Filtration
- URL: http://arxiv.org/abs/2305.04281v2
- Date: Thu, 21 Sep 2023 09:39:55 GMT
- Title: Persistent Homology of the Multiscale Clustering Filtration
- Authors: Dominik J. Schindler and Mauricio Barahona
- Abstract summary: We introduce a filtration of abstract simplicial complexes, denoted the Multiscale Clustering filtration (MCF)
The MCF encodes arbitrary patterns of cluster assignments across scales, and we prove that the MCF produces stable persistence diagrams.
We briefly illustrate how the MCF can serve to characterise multiscale clustering structures in numerical experiments on synthetic data.
- Score: 0.9790236766474201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many applications in data clustering, it is desirable to find not just a
single partition into clusters but a sequence of partitions describing the data
at different scales, or levels of coarseness. A natural problem then is to
analyse and compare the (not necessarily hierarchical) sequences of partitions
that underpin such multiscale descriptions of data. Here, we introduce a
filtration of abstract simplicial complexes, denoted the Multiscale Clustering
Filtration (MCF), which encodes arbitrary patterns of cluster assignments
across scales, and we prove that the MCF produces stable persistence diagrams.
We then show that the zero-dimensional persistent homology of the MCF measures
the degree of hierarchy in the sequence of partitions, and that the
higher-dimensional persistent homology tracks the emergence and resolution of
conflicts between cluster assignments across the sequence of partitions. To
broaden the theoretical foundations of the MCF, we also provide an equivalent
construction via a nerve complex filtration, and we show that in the
hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an
ultrametric space. We briefly illustrate how the MCF can serve to characterise
multiscale clustering structures in numerical experiments on synthetic data.
Related papers
- Robust Categorical Data Clustering Guided by Multi-Granular Competitive Learning [47.32771052588132]
The nested granular cluster effect is prevalent in the implicit discrete distance space of categorical data.<n>We propose a Multi-Granular Competitiveization Learning algorithm to allow potential clusters to interactively tune themselves.<n>It is shown that the proposed MGCPL-guided Categorical Data Clustering approach is competent in exploring the nested distribution of multi-granular clusters.
arXiv Detail & Related papers (2026-01-23T06:33:08Z) - MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology [1.5813217907813781]
We define the Multiscale Clustering Bifiltration (MCbiF) as a filtration of abstract simplicial complexes that encodes cluster intersection patterns across scales.<n>We show that the persistent homology (MPH) of the MCbiF yields a finitely presented and block decomposable module.<n>We demonstrate through experiments the use of MCbiF Hilbert functions as topological feature maps for downstream machine learning tasks.
arXiv Detail & Related papers (2025-10-16T14:11:12Z) - Hierarchical clustering with maximum density paths and mixture models [39.42511559155036]
Hierarchical clustering is an effective and interpretable technique for analyzing structure in data.
It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets.
Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape.
This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis.
arXiv Detail & Related papers (2025-03-19T15:37:51Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Efficient and Effective Deep Multi-view Subspace Clustering [9.6753782215283]
We propose a novel deep framework, termed Efficient and Effective deep Multi-View Subspace Clustering (E$2$MVSC)
Instead of a parameterized FC layer, we design a Relation-Metric Net that decouples network parameter scale from sample numbers for greater computational efficiency.
E$2$MVSC yields comparable results to existing methods and achieves state-of-the-art performance in various types of multi-view datasets.
arXiv Detail & Related papers (2023-10-15T03:08:25Z) - Contrastive Continual Multi-view Clustering with Filtered Structural
Fusion [57.193645780552565]
Multi-view clustering thrives in applications where views are collected in advance.
It overlooks scenarios where data views are collected sequentially, i.e., real-time data.
Some methods are proposed to handle it but are trapped in a stability-plasticity dilemma.
We propose Contrastive Continual Multi-view Clustering with Filtered Structural Fusion.
arXiv Detail & Related papers (2023-09-26T14:18:29Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Adaptively-weighted Integral Space for Fast Multiview Clustering [54.177846260063966]
We propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity.
Specifically, view generation models are designed to reconstruct the view observations from the latent integral space.
Experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method.
arXiv Detail & Related papers (2022-08-25T05:47:39Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - Skeleton Clustering: Dimension-Free Density-based Clustering [0.2538209532048866]
We introduce a density-based clustering method called skeleton clustering.
To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations.
arXiv Detail & Related papers (2021-04-21T21:25:02Z) - Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial
Diffusion Geometry [9.619814126465206]
Clustering algorithms partition a dataset into groups of similar points.
The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm.
We show that incorporating spatial regularization into a multiscale clustering framework corresponds to smoother and more coherent clusters when applied to HSI data.
arXiv Detail & Related papers (2021-03-29T17:24:28Z) - A Multiscale Environment for Learning by Diffusion [9.619814126465206]
We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model.
We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis.
To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Diffusion (M-LUND) clustering algorithm.
arXiv Detail & Related papers (2021-01-31T17:46:19Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - Multiple Flat Projections for Cross-manifold Clustering [11.616653147570446]
Cross-manifold clustering is a hard topic and many traditional clustering methods fail because of cross-manifold structures.
We propose a Multiple Flat Projections Clustering (C) to deal with cross-manifold clustering problems.
arXiv Detail & Related papers (2020-02-17T02:16:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.