DECWA : Density-Based Clustering using Wasserstein Distance
- URL: http://arxiv.org/abs/2310.16552v1
- Date: Wed, 25 Oct 2023 11:10:08 GMT
- Title: DECWA : Density-Based Clustering using Wasserstein Distance
- Authors: Nabil El Malki, Robin Cugny, Olivier Teste, Franck Ravat
- Abstract summary: We propose a new clustering algorithm based on spatial density and probabilistic approach.
We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
- Score: 1.4132765964347058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Clustering is a data analysis method for extracting knowledge by discovering
groups of data called clusters. Among these methods, state-of-the-art
density-based clustering methods have proven to be effective for
arbitrary-shaped clusters. Despite their encouraging results, they suffer to
find low-density clusters, near clusters with similar densities, and
high-dimensional data. Our proposals are a new characterization of clusters and
a new clustering algorithm based on spatial density and probabilistic approach.
First of all, sub-clusters are built using spatial density represented as
probability density function ($p.d.f$) of pairwise distances between points. A
method is then proposed to agglomerate similar sub-clusters by using both their
density ($p.d.f$) and their spatial distance. The key idea we propose is to use
the Wasserstein metric, a powerful tool to measure the distance between $p.d.f$
of sub-clusters. We show that our approach outperforms other state-of-the-art
density-based clustering methods on a wide variety of datasets.
Related papers
- Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - SHADE: Deep Density-based Clustering [13.629470968274]
SHADE is the first deep clustering algorithm that incorporates density-connectivity into its loss function.
It supports high-dimensional and large data sets with the expressive power of a deep autoencoder.
It outperforms existing methods in clustering quality, especially on data that contain non-Gaussian clusters.
arXiv Detail & Related papers (2024-10-08T18:03:35Z) - GFDC: A Granule Fusion Density-Based Clustering with Evidential
Reasoning [22.526274021556755]
density-based clustering algorithms are widely applied because they can detect clusters with arbitrary shapes.
This paper proposes a granule fusion density-based clustering with evidential reasoning (GFDC)
Both local and global densities of samples are measured by a sparse degree metric first.
Then information granules are generated in high-density and low-density regions, assisting in processing clusters with significant density differences.
arXiv Detail & Related papers (2023-05-20T06:27:31Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - VDPC: Variational Density Peak Clustering Algorithm [16.20037014662979]
We propose a variational density peak clustering (VDPC) algorithm to identify clusters with variational density.
VDPC outperforms two classical algorithms (i.e., DPC and DBSCAN) and four state-of-the-art extended DPC algorithms.
arXiv Detail & Related papers (2021-12-29T12:50:09Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Skeleton Clustering: Dimension-Free Density-based Clustering [0.2538209532048866]
We introduce a density-based clustering method called skeleton clustering.
To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations.
arXiv Detail & Related papers (2021-04-21T21:25:02Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.