A Computational Approach to Improving Fairness in K-means Clustering
- URL: http://arxiv.org/abs/2505.22984v2
- Date: Sat, 31 May 2025 12:44:03 GMT
- Title: A Computational Approach to Improving Fairness in K-means Clustering
- Authors: Guancheng Zhou, Haiping Xu, Hongkang Xu, Chenyu Li, Donghui Yan,
- Abstract summary: The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation.<n>This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation.<n> Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality.
- Score: 8.001963712764569
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation. Some cluster may have disproportionately more (or fewer) points from one of the subpopulations in terms of some sensitive variable, e.g., gender or race. Such a fairness issue may cause bias and unexpected social consequences. This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation--clustering first and then adjust cluster membership of a small subset of selected data points. Two computationally efficient algorithms are proposed in identifying those data points that are expensive for fairness, with one focusing on nearest data points outside of a cluster and the other on highly 'mixed' data points. Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality. The proposed algorithms can be easily extended to a broad class of clustering algorithms or fairness metrics.
Related papers
- K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z) - Fair Clustering via Alignment [3.5845787949988592]
Algorithmic fairness in clustering aims to balance proportions of instances assigned to each cluster with respect to a given sensitive attribute.<n>We propose a new fair clustering algorithm based on a novel decomposition of the fair $K$-means clustering objective function.
arXiv Detail & Related papers (2025-05-14T04:29:09Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Cluster-level Group Representativity Fairness in $k$-means Clustering [3.420467786581458]
Clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters.
We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms.
We show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.
arXiv Detail & Related papers (2022-12-29T22:02:28Z) - Socially Fair Center-based and Linear Subspace Clustering [8.355270405285909]
Center-based clustering and linear subspace clustering are popular techniques to partition real-world data into smaller clusters.
Different clustering cost per point for different sensitive groups can lead to fairness-related harms.
We propose a unified framework to solve socially fair center-based clustering and linear subspace clustering.
arXiv Detail & Related papers (2022-08-22T07:10:17Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type.
The algorithm is capable of detecting outliers and clusters of relatively lower density values.
We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.