Robust Trimmed k-means
- URL: http://arxiv.org/abs/2108.07186v1
- Date: Mon, 16 Aug 2021 15:49:40 GMT
- Title: Robust Trimmed k-means
- Authors: Olga Dorabiala, J. Nathan Kutz, Aleksandr Aravkin
- Abstract summary: We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
- Score: 70.88503833248159
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is a fundamental tool in unsupervised learning, used to group
objects by distinguishing between similar and dissimilar features of a given
data set. One of the most common clustering algorithms is k-means.
Unfortunately, when dealing with real-world data many traditional clustering
algorithms are compromised by lack of clear separation between groups, noisy
observations, and/or outlying data points. Thus, robust statistical algorithms
are required for successful data analytics. Current methods that robustify
k-means clustering are specialized for either single or multi-membership data,
but do not perform competitively in both cases. We propose an extension of the
k-means algorithm, which we call Robust Trimmed k-means (RTKM) that
simultaneously identifies outliers and clusters points and can be applied to
either single- or multi-membership data. We test RTKM on various real-world
datasets and show that RTKM performs competitively with other methods on single
membership data with outliers and multi-membership data without outliers. We
also show that RTKM leverages its relative advantages to outperform other
methods on multi-membership data containing outliers.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review [0.3069335774032178]
K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with large datasets.
The paper explores different approaches to overcome these issues, including parallelization, approximation, and sampling methods.
arXiv Detail & Related papers (2023-10-15T12:35:27Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - ck-means, a novel unsupervised learning method that combines fuzzy and
crispy clustering methods to extract intersecting data [1.827510863075184]
This paper proposes a method to cluster data that share the same intersections between two features or more.
The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm.
The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI)
arXiv Detail & Related papers (2022-06-17T19:29:50Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Too Much Information Kills Information: A Clustering Perspective [6.375668163098171]
We propose a simple, but novel approach for variance-based k-clustering tasks, including in which is the widely known k-means clustering.
The proposed approach picks a sampling subset from the given dataset and makes decisions based on the data information in the subset only.
With certain assumptions, the resulting clustering is provably good to estimate the optimum of the variance-based objective with high probability.
arXiv Detail & Related papers (2020-09-16T01:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.