On the use of Wasserstein metric in topological clustering of
distributional data
- URL: http://arxiv.org/abs/2109.04301v1
- Date: Thu, 9 Sep 2021 14:27:15 GMT
- Title: On the use of Wasserstein metric in topological clustering of
distributional data
- Authors: Gu\'ena\"el Cabanes, Youn\`es Bennani, Rosanna Verde and Antonio
Irpino
- Abstract summary: This paper deals with a clustering algorithm for histogram data based on a Self-Organizing Map (SOM) learning.
It combines a dimension reduction by SOM and the clustering of the data in a reduced space.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper deals with a clustering algorithm for histogram data based on a
Self-Organizing Map (SOM) learning. It combines a dimension reduction by SOM
and the clustering of the data in a reduced space. Related to the kind of data,
a suitable dissimilarity measure between distributions is introduced: the $L_2$
Wasserstein distance. Moreover, the number of clusters is not fixed in advance
but it is automatically found according to a local data density estimation in
the original space. Applications on synthetic and real data sets corroborate
the proposed strategy.
Related papers
- Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - DECWA : Density-Based Clustering using Wasserstein Distance [1.4132765964347058]
We propose a new clustering algorithm based on spatial density and probabilistic approach.
We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
arXiv Detail & Related papers (2023-10-25T11:10:08Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Noise-robust Clustering [2.0199917525888895]
This paper presents noise-robust clustering techniques in unsupervised machine learning.
The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics.
arXiv Detail & Related papers (2021-10-17T17:15:13Z) - Tensor Laplacian Regularized Low-Rank Representation for Non-uniformly
Distributed Data Subspace Clustering [2.578242050187029]
Low-Rank Representation (LRR) suffers from discarding the locality information of data points in subspace clustering.
We propose a hypergraph model to facilitate having a variable number of adjacent nodes and incorporating the locality information of the data.
Experiments on artificial and real datasets demonstrate the higher accuracy and precision of the proposed method.
arXiv Detail & Related papers (2021-03-06T08:22:24Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z) - SDCOR: Scalable Density-based Clustering for Local Outlier Detection in
Massive-Scale Datasets [0.0]
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets.
Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity.
arXiv Detail & Related papers (2020-06-13T11:07:37Z) - Stochastic Sparse Subspace Clustering [20.30051592270384]
State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points.
We introduce dropout to address the issue of over-segmentation, which is based on randomly dropping out data points.
This leads to a scalable and flexible sparse subspace clustering approach, termed Sparse Subspace Clustering.
arXiv Detail & Related papers (2020-05-04T13:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.