Related papers: A density peaks clustering algorithm with sparse search and K-d tree

A density peaks clustering algorithm with sparse search and K-d tree

URL: http://arxiv.org/abs/2203.00973v1
Date: Wed, 2 Mar 2022 09:29:40 GMT
Title: A density peaks clustering algorithm with sparse search and K-d tree
Authors: Yunxiao Shan, Shu Li, Fuxiang Li, Yuxin Cui, Shuai Li, Minghua Chen, Xunjun He
Abstract summary: Density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Experiments are carried out on datasets with different distribution characteristics, by comparing with other five typical clustering algorithms.
Score: 16.141611031128427
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Firstly, a sparse distance matrix is calculated by using K-d tree to replace the original full rank distance matrix, so as to accelerate the calculation of local density. Secondly, a sparse search strategy is proposed to accelerate the computation of relative-separation with the intersection between the set of k nearest neighbors and the set consisting of the data points with larger local density for any data point. Furthermore, a second-order difference method for decision values is adopted to determine the cluster centers adaptively. Finally, experiments are carried out on datasets with different distribution characteristics, by comparing with other five typical clustering algorithms. It is proved that the algorithm can effectively reduce the computational complexity. Especially for larger datasets, the efficiency is elevated more remarkably. Moreover, the clustering accuracy is also improved to a certain extent. Therefore, it can be concluded that the overall performance of the newly proposed algorithm is excellent.

Related papers

Accelerating k-Means Clustering with Cover Trees [0.30693357740321775]
We propose a new k-means algorithm based on the cover tree index, that has relatively low overhead and performs well. We obtain a hybrid algorithm that combines the benefits of tree aggregation and bounds-based filtering.
arXiv Detail & Related papers (2024-10-19T14:02:42Z)
Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering. We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity. The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z)
DenMune: Density peak based clustering using mutual nearest neighbors [0.0]
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other. A novel clustering algorithm, DenMune is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user.
arXiv Detail & Related papers (2023-09-23T16:18:00Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Linearized Wasserstein dimensionality reduction with approximation guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z)
Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms. We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z)
Fast Density Estimation for Density-based Clustering Methods [3.8972699157287702]
Density-based clustering algorithms are widely used for discovering clusters in pattern recognition and machine learning. The robustness of density-based algorithms is heavily dominated by finding neighbors and calculating the density of each point which is time-consuming. This paper proposes a density-based clustering framework by using the fast principal component analysis, which can be applied to density based methods to prune unnecessary distance calculations.
arXiv Detail & Related papers (2021-09-23T13:59:42Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type. The algorithm is capable of detecting outliers and clusters of relatively lower density values. We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.