Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System
- URL: http://arxiv.org/abs/2406.12297v1
- Date: Tue, 18 Jun 2024 06:05:45 GMT
- Title: Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System
- Authors: Ji Xu, Tianlong Xiao, Jinye Yang, Panpan Zhu,
- Abstract summary: Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data.
We present a faithful and parallel DP method that makes use of two types of vector-like distance matrices and an inverse leading-node-finding policy.
Our method is capable of clustering non-Euclidean data such as in community detection, while outperforming the state-of-the-art counterpart methods in accuracy when clustering large Euclidean data.
- Score: 7.594123537718585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree, Z-value), granular computing, and so forth. However, most of these existing methods face two limitations. One is their target datasets are mostly constrained to be in Euclidian space, the other is they emphasize only on local neighbors while ignoring global data distribution due to restriction to cut-off kernel when computing density. To address the two issues, we present a faithful and parallel DP method that makes use of two types of vector-like distance matrices and an inverse leading-node-finding policy. The method is implemented on a message passing interface (MPI) system. Extensive experiments showed that our method is capable of clustering non-Euclidean data such as in community detection, while outperforming the state-of-the-art counterpart methods in accuracy when clustering large Euclidean data. Our code is publicly available at https://github.com/alanxuji/FaithPDP.
Related papers
- Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - DECWA : Density-Based Clustering using Wasserstein Distance [1.4132765964347058]
We propose a new clustering algorithm based on spatial density and probabilistic approach.
We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
arXiv Detail & Related papers (2023-10-25T11:10:08Z) - Distribution-Based Trajectory Clustering [14.781854651899705]
Trajectory clustering enables the discovery of common patterns in trajectory data.
The distance measures employed have two challenges: high computational cost and low fidelity.
We propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges.
arXiv Detail & Related papers (2023-10-08T11:28:34Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data.
In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data.
Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Overcomplete Deep Subspace Clustering Networks [80.16644725886968]
Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error.
Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise.
arXiv Detail & Related papers (2020-11-16T22:07:18Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.