Persistent Multiscale Density-based Clustering
- URL: http://arxiv.org/abs/2512.16558v1
- Date: Thu, 18 Dec 2025 14:01:35 GMT
- Title: Persistent Multiscale Density-based Clustering
- Authors: Daniƫl Bot, Leland McInnes, Jan Aerts,
- Abstract summary: Persistent Leaves Spatial Clustering for Applications with Noise (PLSCAN)<n>PLSCAN efficiently identifies all minimum cluster sizes for which HDBSCAN* produces stable (leaf) clusters.<n>We compare PLSCAN's performance to HDBSCAN* on several real-world datasets.
- Score: 0.515435457943463
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clustering is a cornerstone of modern data analysis. Detecting clusters in exploratory data analyses (EDA) requires algorithms that make few assumptions about the data. Density-based clustering algorithms are particularly well-suited for EDA because they describe high-density regions, assuming only that a density exists. Applying density-based clustering algorithms in practice, however, requires selecting appropriate hyperparameters, which is difficult without prior knowledge of the data distribution. For example, DBSCAN requires selecting a density threshold, and HDBSCAN* relies on a minimum cluster size parameter. In this work, we propose Persistent Leaves Spatial Clustering for Applications with Noise (PLSCAN). This novel density-based clustering algorithm efficiently identifies all minimum cluster sizes for which HDBSCAN* produces stable (leaf) clusters. PLSCAN applies scale-space clustering principles and is equivalent to persistent homology on a novel metric space. We compare its performance to HDBSCAN* on several real-world datasets, demonstrating that it achieves a higher average ARI and is less sensitive to changes in the number of mutual reachability neighbours. Additionally, we compare PLSCAN's computational costs to k-Means, demonstrating competitive run-times on low-dimensional datasets. At higher dimensions, run times scale more similarly to HDBSCAN*.
Related papers
- GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data [62.363178614776295]
We propose a novel scalable skeleton clustering algorithm, namely GBSK, to handle clustering task for large-scale datasets.<n>By multi-sampling the dataset and constructing multi-grained granular-balls, GBSK progressively uncovers a statistical "skeleton"<n>In addition, we introduce an adaptive version, AGBSK, with simplified parameter settings to enhance usability and facilitate deployment in real-world scenarios.
arXiv Detail & Related papers (2025-09-28T08:41:15Z) - Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z) - Hierarchical clustering with maximum density paths and mixture models [44.443538161979056]
t-NEB is a probabilistically grounded hierarchical clustering method.<n>It yields state-of-the-art clustering performance on naturalistic high-dimensional data.
arXiv Detail & Related papers (2025-03-19T15:37:51Z) - Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - Scalable Density-based Clustering with Random Projections [7.642646077340124]
We present sDBSCAN, a scalable density-based clustering algorithm in high dimensions with cosine distance.<n> Empirically, sDBSCAN is significantly faster and provides higher accuracy than many other clustering algorithms on real-world million-point data sets.
arXiv Detail & Related papers (2024-02-24T01:45:51Z) - FLASC: A Flare-Sensitive Clustering Algorithm [0.0]
We present FLASC, an algorithm that detects branches within clusters to identify subpopulations.
Two variants of the algorithm are presented, which trade computational cost for noise robustness.
We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs.
arXiv Detail & Related papers (2023-11-27T14:55:16Z) - DECWA : Density-Based Clustering using Wasserstein Distance [1.4132765964347058]
We propose a new clustering algorithm based on spatial density and probabilistic approach.
We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
arXiv Detail & Related papers (2023-10-25T11:10:08Z) - SDC-HSDD-NDSA: Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption [0.0]
Density-based clustering is the most popular clustering algorithm.<n>It can identify clusters of arbitrary shape as long as they are separated by low-density regions.<n>However, a high-density region that is not separated by low-density ones might also have different structures belonging to multiple clusters.<n>In this paper, we provide a novel density-based clustering scheme to address this problem.
arXiv Detail & Related papers (2023-07-02T22:30:08Z) - GBMST: An Efficient Minimum Spanning Tree Clustering Based on
Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST)
We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority"
Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.