Fast Density Estimation for Density-based Clustering Methods
- URL: http://arxiv.org/abs/2109.11383v1
- Date: Thu, 23 Sep 2021 13:59:42 GMT
- Title: Fast Density Estimation for Density-based Clustering Methods
- Authors: Difei Cheng, Ruihang Xu, Bo Zhang
- Abstract summary: Density-based clustering algorithms are widely used for discovering clusters in pattern recognition and machine learning.
The robustness of density-based algorithms is heavily dominated by finding neighbors and calculating the density of each point which is time-consuming.
This paper proposes a density-based clustering framework by using the fast principal component analysis, which can be applied to density based methods to prune unnecessary distance calculations.
- Score: 3.8972699157287702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Density-based clustering algorithms are widely used for discovering clusters
in pattern recognition and machine learning since they can deal with
non-hyperspherical clusters and are robustness to handle outliers. However, the
runtime of density-based algorithms is heavily dominated by finding neighbors
and calculating the density of each point which is time-consuming. To address
this issue, this paper proposes a density-based clustering framework by using
the fast principal component analysis, which can be applied to density based
methods to prune unnecessary distance calculations when finding neighbors and
estimating densities. By applying this clustering framework to the Density
Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, an
improved DBSCAN (called IDBSCAN) is obtained, which preserves the advantage of
DBSCAN and meanwhile, greatly reduces the computation of redundant distances.
Experiments on five benchmark datasets demonstrate that the proposed IDBSCAN
algorithm improves the computational efficiency significantly.
Related papers
- Scalable Density-based Clustering with Random Projections [9.028773906859541]
We present sDBSCAN, a scalable density-based clustering algorithm in high dimensions with cosine distance.
Empirically, sDBSCAN is significantly faster and provides higher accuracy than many other clustering algorithms on real-world million-point data sets.
arXiv Detail & Related papers (2024-02-24T01:45:51Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - An Improved Probability Propagation Algorithm for Density Peak
Clustering Based on Natural Nearest Neighborhood [0.0]
Clustering by fast search and find of density peaks (DPC) has been proven to be a promising clustering approach.
This paper presents an improved probability propagation algorithm for density peak clustering based on the natural nearest neighborhood (DPC-PPNNN)
In experiments on several datasets, DPC-PPNNN is shown to outperform DPC, K-means and DBSCAN.
arXiv Detail & Related papers (2022-07-04T03:36:57Z) - A density peaks clustering algorithm with sparse search and K-d tree [16.141611031128427]
Density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem.
Experiments are carried out on datasets with different distribution characteristics, by comparing with other five typical clustering algorithms.
arXiv Detail & Related papers (2022-03-02T09:29:40Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time
Warping [57.316437798033974]
In this work we consider the problem of center-based clustering of trajectories.
We propose the usage of a continuous version of DTW as distance measure, which we call continuous dynamic time warping (CDTW)
We show a practical way to compute a center from a set of trajectories and subsequently iteratively improve it.
arXiv Detail & Related papers (2020-12-01T13:17:27Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Non-Local Spatial Propagation Network for Depth Completion [82.60915972250706]
We propose a robust and efficient end-to-end non-local spatial propagation network for depth completion.
The proposed network takes RGB and sparse depth images as inputs and estimates non-local neighbors and their affinities of each pixel.
We show that the proposed algorithm is superior to conventional algorithms in terms of depth completion accuracy and robustness to the mixed-depth problem.
arXiv Detail & Related papers (2020-07-20T12:26:51Z) - SDCOR: Scalable Density-based Clustering for Local Outlier Detection in
Massive-Scale Datasets [0.0]
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets.
Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity.
arXiv Detail & Related papers (2020-06-13T11:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.