Related papers: Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces

URL: http://arxiv.org/abs/2504.03928v1
Date: Fri, 04 Apr 2025 20:48:43 GMT
Title: Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces
Authors: Abderrafik Laakel Hemdanou, Youssef Achtoun, Mohammed Lamarti Sefian, Ismail Tahiri, Abdellatif El Afia,
Abstract summary: We introduce the first k-means variant in the literature that operates within a probabilistic metric space.<n>By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework.<n>Our proposed random normed k-means (RNKM) algorithm exhibits a remarkable ability to identify nonlinearly separable structures.
Score: 0.7864304771129751
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing approaches remain largely constrained by traditional distance metrics, limiting their effectiveness in handling random data. In this work, we introduce the first k-means variant in the literature that operates within a probabilistic metric space, replacing conventional distance measures with a well-defined distance distribution function. This pioneering approach enables more flexible and robust clustering in both deterministic and random datasets, establishing a new foundation for clustering in stochastic environments. By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework that is expected to serve as a key reference for future clustering research involving random data. Extensive experiments on diverse real and synthetic datasets assess our model's effectiveness using widely recognized evaluation metrics, including Silhouette, Davies-Bouldin, Calinski Harabasz, the adjusted Rand index, and distortion. Comparative analyses against established methods such as k-means++, fuzzy c-means, and kernel probabilistic k-means demonstrate the superior performance of our proposed random normed k-means (RNKM) algorithm. Notably, RNKM exhibits a remarkable ability to identify nonlinearly separable structures, making it highly effective in complex clustering scenarios. These findings position RNKM as a groundbreaking advancement in clustering research, offering a powerful alternative to traditional techniques while addressing a long-standing gap in the literature. By bridging probabilistic metrics with clustering, this study provides a foundational reference for future developments and opens new avenues for advanced data analysis in dynamic, data-driven applications.

Related papers

K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z)
Clustering by Nonparametric Smoothing [6.635604919499181]
A novel formulation of the clustering problem is introduced in which the task is expressed as an estimation problem. The proposed approach bypasses any explicit modelling assumptions and exploits the flexible estimation potential of nonparametric smoothing. Experiments on a large collection of publicly available data sets are used to document the strong performance of the proposed approach.
arXiv Detail & Related papers (2025-03-12T07:44:11Z)
Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.<n>We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z)
Dirichlet Process-based Robust Clustering using the Median-of-Means Estimator [16.774378814288806]
We propose an efficient and automatic clustering technique by integrating the strengths of model-based and centroid-based methodologies.<n>Our method mitigates the effect of noise on the quality of clustering; while at the same time, estimates the number of clusters.
arXiv Detail & Related papers (2023-11-26T19:01:15Z)
Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF. It also offers theoretical guarantees based on results of local consistency. This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z)
A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments. One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z)
Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions [16.876111500144667]
We introduce a novel probabilistic clustering method, referred to as k-sBetas. We provide a general maximum a posteriori (MAP) perspective of clustering distributions. Our code and comparisons with the existing simplex-clustering approaches and our introduced softmax-prediction benchmarks are publicly available.
arXiv Detail & Related papers (2022-07-30T18:29:11Z)
Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points. We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z)
Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data. We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [6.174448419090291]
Fuzzy-Possibilistic (FP) index works well in the presence of clusters that vary in shape and density. FPCM requires a priori selection of the degree of fuzziness and the degree of typicality.
arXiv Detail & Related papers (2020-05-19T01:48:13Z)
Stable and consistent density-based clustering via multiparameter persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance. We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)
Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions [5.137336092866906]
Robustly determining optimal number of clusters in a data set is an essential factor in a wide range of applications. This article generalizes so that it can be used with any arbitrary Really Symmetric (RES) distributed mixture model. We derive a robust criterion for data sets with finite sample size, and also provide an approximation to reduce the computational cost at large sample sizes.
arXiv Detail & Related papers (2020-05-04T11:44:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.