On clustering uncertain and structured data with Wasserstein barycenters
and a geodesic criterion for the number of clusters
- URL: http://arxiv.org/abs/1912.11801v3
- Date: Tue, 13 Sep 2022 13:16:35 GMT
- Title: On clustering uncertain and structured data with Wasserstein barycenters
and a geodesic criterion for the number of clusters
- Authors: G.I. Papayiannis, G.N. Domazakis, D. Drivaliaris, S. Koukoulas, A.E.
Tsekrekos, A.N. Yannacopoulos
- Abstract summary: This work considers the notion of Wasserstein barycenters, accompanied by appropriate clustering indices based on the intrinsic geometry of the Wasserstein space where the clustering task is performed.
Such type of clustering approaches are highly appreciated in many fields where the observational/experimental error is significant.
Under this perspective, each observation is identified by an appropriate probability measure and the proposed clustering schemes rely on discrimination criteria.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work clustering schemes for uncertain and structured data are
considered relying on the notion of Wasserstein barycenters, accompanied by
appropriate clustering indices based on the intrinsic geometry of the
Wasserstein space where the clustering task is performed. Such type of
clustering approaches are highly appreciated in many fields where the
observational/experimental error is significant (e.g. astronomy, biology,
remote sensing, etc.) or the data nature is more complex and the traditional
learning algorithms are not applicable or effective to treat them (e.g. network
data, interval data, high frequency records, matrix data, etc.). Under this
perspective, each observation is identified by an appropriate probability
measure and the proposed clustering schemes rely on discrimination criteria
that utilize the geometric structure of the space of probability measures
through core techniques from the optimal transport theory. The advantages and
capabilities of the proposed approach and the geodesic criterion performance
are illustrated through a simulation study and the implementation in two real
world applications: (a) clustering eurozone countries according to their
observed government bond yield curves and (b) classifying the areas of a
satellite image to certain land uses categories, a standard task in remote
sensing.
Related papers
- From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms.
We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Circular Clustering with Polar Coordinate Reconstruction [6.598049778463762]
Traditional clustering algorithms are often inadequate due to their limited ability to distinguish differences in the periodic component.
We propose a new analysis framework that utilizes projections onto a cylindrical coordinate system to better represent objects in a polar coordinate system.
Our approach is generally applicable and adaptable and can be incorporated into most state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-09-15T20:56:01Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Spatially Coherent Clustering Based on Orthogonal Nonnegative Matrix
Factorization [0.0]
We introduce in this work clustering models based on a total variation (TV) regularization procedure on the cluster membership matrix.
We provide a numerical evaluation of all proposed methods on a hyperspectral dataset obtained from a matrix-assisted laser desorption/ionisation imaging measurement.
arXiv Detail & Related papers (2021-04-25T23:40:41Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.