KNN and K-means in Gini Prametric Spaces
- URL: http://arxiv.org/abs/2501.18028v3
- Date: Tue, 26 Aug 2025 14:53:23 GMT
- Title: KNN and K-means in Gini Prametric Spaces
- Authors: Cassandra Mussard, Arthur Charpentier, Stéphane Mussard,
- Abstract summary: This paper introduces enhancements to the K-means and K-nearest neighbors algorithms based on the concept of Gini prametric spaces.<n>Gini prametrics incorporate both value-based and rank-based measures, offering robustness to noise and outliers.<n>The main contributions include: (1) a Gini prametric that captures rank information alongside value distances; (2) a Gini K-means algorithm that is provably convergent and resilient to noisy data; and (3) a Gini KNN method that performs competitively with state-of-the-art approaches like Hassanat's distance in noisy environments.
- Score: 13.991315209521703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces enhancements to the K-means and K-nearest neighbors (KNN) algorithms based on the concept of Gini prametric spaces, instead of traditional metric spaces. Unlike standard distance metrics, Gini prametrics incorporate both value-based and rank-based measures, offering robustness to noise and outliers. The main contributions include: (1) a Gini prametric that captures rank information alongside value distances; (2) a Gini K-means algorithm that is provably convergent and resilient to noisy data; and (3) a Gini KNN method that performs competitively with state-of-the-art approaches like Hassanat's distance in noisy environments. Experimental evaluations on 16 UCI datasets demonstrate the superior performance and efficiency of the Gini-based algorithms in clustering and classification tasks. This work opens new directions for rank-based prametrics in machine learning and statistical analysis.
Related papers
- Silhouette-Guided Instance-Weighted k-means [2.56711111236449]
K-Sil is a silhouette-guided refinement of the k-means algorithm that weights points based on their silhouette scores.<n>It prioritizes well-clustered instances while suppressing borderline or noisy regions.<n>These results establish K-Sil as a principled alternative for applications demanding high-quality, well-separated clusters.
arXiv Detail & Related papers (2025-06-15T15:09:05Z) - K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z) - An Approach Towards Learning K-means-friendly Deep Latent Representation [0.6798775532273751]
Clustering is a long-standing problem area in data mining.<n>With the advent of deep neural networks, a common approach to this problem is to map the data to some latent space of comparatively lower dimensions.<n>A well-known centroid-based clustering algorithm is K-means.
arXiv Detail & Related papers (2024-11-29T06:28:38Z) - K-GBS3FCM -- KNN Graph-Based Safe Semi-Supervised Fuzzy C-Means [0.0]
This paper introduces the KNN graph-based safety-aware semi-supervised fuzzy c-means algorithm (K-GBS3FCM)
It dynamically assesses neighborhood relationships between labeled and unlabeled data using the K-Nearest Neighbors (KNN) algorithm.
It is proposed a mechanism that adjusts the influence of labeled data on unlabeled ones through regularization parameters and the average safety degree.
arXiv Detail & Related papers (2024-11-22T04:48:58Z) - K-Means Clustering With Incomplete Data with the Use of Mahalanobis Distances [0.0]
We develop a unified K-means algorithm that incorporates Mahalanobis distances, instead of the traditional Euclidean distances.
Our algorithm consistently outperforms both standalone imputation followed by K-means and K-Means with Incomplete Data.
arXiv Detail & Related papers (2024-10-31T00:05:09Z) - MIK: Modified Isolation Kernel for Biological Sequence Visualization, Classification, and Clustering [3.9146761527401424]
This research proposes a novel approach called the Modified Isolation Kernel (MIK) as an alternative to the Gaussian kernel.
MIK uses adaptive density estimation to capture local structures more accurately and integrates robustness measures.
It exhibits improved preservation of the local and global structure and enables better visualization of clusters and subclusters in the embedded space.
arXiv Detail & Related papers (2024-10-21T06:57:09Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Information Modified K-Nearest Neighbor [4.916646834691489]
K-Nearest Neighbors (KNN) is the classification of samples based on the majority through their nearest neighbors.
Many KNN methodologies introduce complex algorithms that do not significantly outperform the traditional KNN.
We present a proposed information-modified KNN (IMKNN) to improve the performance of the KNN algorithm.
We conduct experiments on 12 widely-used datasets, achieving 11.05%, 12.42%, and 12.07% in accuracy, precision, and recall performance, respectively.
arXiv Detail & Related papers (2023-12-04T16:10:34Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Interactive Segmentation as Gaussian Process Classification [58.44673380545409]
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction.
Most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation.
We propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image.
arXiv Detail & Related papers (2023-02-28T14:01:01Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect
the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters.
Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z) - Noise-robust Graph Learning by Estimating and Leveraging Pairwise
Interactions [123.07967420310796]
This paper bridges the gap by proposing a pairwise framework for noisy node classification on graphs.
PI-GNN relies on the PI as a primary learning proxy in addition to the pointwise learning from the noisy node class labels.
Our proposed framework PI-GNN contributes two novel components: (1) a confidence-aware PI estimation model that adaptively estimates the PI labels, and (2) a decoupled training approach that leverages the estimated PI labels.
arXiv Detail & Related papers (2021-06-14T14:23:08Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.