Related papers: DECWA : Density-Based Clustering using Wasserstein Distance

DECWA : Density-Based Clustering using Wasserstein Distance

URL: http://arxiv.org/abs/2310.16552v1
Date: Wed, 25 Oct 2023 11:10:08 GMT
Title: DECWA : Density-Based Clustering using Wasserstein Distance
Authors: Nabil El Malki, Robin Cugny, Olivier Teste, Franck Ravat
Abstract summary: We propose a new clustering algorithm based on spatial density and probabilistic approach. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
Score: 1.4132765964347058
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new clustering algorithm based on spatial density and probabilistic approach. First of all, sub-clusters are built using spatial density represented as probability density function ($p.d.f$) of pairwise distances between points. A method is then proposed to agglomerate similar sub-clusters by using both their density ($p.d.f$) and their spatial distance. The key idea we propose is to use the Wasserstein metric, a powerful tool to measure the distance between $p.d.f$ of sub-clusters. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.

Related papers

Persistent Multiscale Density-based Clustering [0.515435457943463]
Persistent Leaves Spatial Clustering for Applications with Noise (PLSCAN)<n>PLSCAN efficiently identifies all minimum cluster sizes for which HDBSCAN* produces stable (leaf) clusters.<n>We compare PLSCAN's performance to HDBSCAN* on several real-world datasets.
arXiv Detail & Related papers (2025-12-18T14:01:35Z)
Depth-Based Local Center Clustering: A Framework for Handling Different Clustering Scenarios [46.164361878412656]
Cluster analysis plays a crucial role across numerous scientific and engineering domains.<n>Despite the wealth of clustering methods proposed over the past decades, each method is typically designed for specific scenarios.<n>In this paper, we propose depth-based clustering (DLCC)<n>DLCC makes use of a local version of data depth that is based on subsets of data
arXiv Detail & Related papers (2025-05-14T16:08:11Z)
Hierarchical clustering with maximum density paths and mixture models [39.42511559155036]
Hierarchical clustering is an effective and interpretable technique for analyzing structure in data. It is particularly helpful in settings where the exact number of clusters is unknown, and provides a robust framework for exploring complex datasets. Our method addresses this limitation by leveraging a two-stage approach, first employing a Gaussian or Student's t mixture model to overcluster the data, and then hierarchically merging clusters based on the induced density landscape. This approach yields state-of-the-art clustering performance while also providing a meaningful hierarchy, making it a valuable tool for exploratory data analysis.
arXiv Detail & Related papers (2025-03-19T15:37:51Z)
Hyperoctant Search Clustering: A Method for Clustering Data in High-Dimensional Hyperspheres [0.0]
We propose a new clustering method based on a topological approach applied to regions of space defined by signs of coordinates (hyperoctants) According to a density criterion, the method builds clusters of data points based on the partitioning of a graph. We choose the application of topic detection, which is an important task in text mining.
arXiv Detail & Related papers (2025-03-10T23:41:44Z)
Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space. Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z)
SHADE: Deep Density-based Clustering [13.629470968274]
SHADE is the first deep clustering algorithm that incorporates density-connectivity into its loss function. It supports high-dimensional and large data sets with the expressive power of a deep autoencoder. It outperforms existing methods in clustering quality, especially on data that contain non-Gaussian clusters.
arXiv Detail & Related papers (2024-10-08T18:03:35Z)
GFDC: A Granule Fusion Density-Based Clustering with Evidential Reasoning [22.526274021556755]
density-based clustering algorithms are widely applied because they can detect clusters with arbitrary shapes. This paper proposes a granule fusion density-based clustering with evidential reasoning (GFDC) Both local and global densities of samples are measured by a sparse degree metric first. Then information granules are generated in high-density and low-density regions, assisting in processing clusters with significant density differences.
arXiv Detail & Related papers (2023-05-20T06:27:31Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
VDPC: Variational Density Peak Clustering Algorithm [16.20037014662979]
We propose a variational density peak clustering (VDPC) algorithm to identify clusters with variational density. VDPC outperforms two classical algorithms (i.e., DPC and DBSCAN) and four state-of-the-art extended DPC algorithms.
arXiv Detail & Related papers (2021-12-29T12:50:09Z)
Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types. This is different from anomaly detection, whose goal is to divide anomalies from normal data. We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z)
Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms. We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z)
Skeleton Clustering: Dimension-Free Density-based Clustering [0.2538209532048866]
We introduce a density-based clustering method called skeleton clustering. To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations.
arXiv Detail & Related papers (2021-04-21T21:25:02Z)
Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms. DPPs favor diversity of the center points within subsets. We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.