Socially Fair k-Means Clustering
- URL: http://arxiv.org/abs/2006.10085v2
- Date: Thu, 29 Oct 2020 16:03:50 GMT
- Title: Socially Fair k-Means Clustering
- Authors: Mehrdad Ghadiri, Samira Samadi, Santosh Vempala
- Abstract summary: We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups.
The algorithm, Fair-Lloyd, is a modification of Lloyd's for k-means, inheriting its simplicity, efficiency, and stability.
- Score: 3.3409719900340256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that the popular k-means clustering algorithm (Lloyd's heuristic),
used for a variety of scientific data, can result in outcomes that are
unfavorable to subgroups of data (e.g., demographic groups). Such biased
clusterings can have deleterious implications for human-centric applications
such as resource allocation. We present a fair k-means objective and algorithm
to choose cluster centers that provide equitable costs for different groups.
The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for k-means,
inheriting its simplicity, efficiency, and stability. In comparison with
standard Lloyd's, we find that on benchmark datasets, Fair-Lloyd exhibits
unbiased performance by ensuring that all groups have equal costs in the output
k-clustering, while incurring a negligible increase in running time, thus
making it a viable fair option wherever k-means is currently used.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fair Minimum Representation Clustering [0.0]
Clustering is an unsupervised learning task that aims to partition data into a set of clusters.
We show that the popular $k$-means algorithm, Lloyd's algorithm, can result in unfair outcomes.
We present a variant of Lloyd's algorithm, called MiniReL, that directly incorporates the fairness constraints.
arXiv Detail & Related papers (2023-02-06T23:16:38Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Socially Fair Center-based and Linear Subspace Clustering [8.355270405285909]
Center-based clustering and linear subspace clustering are popular techniques to partition real-world data into smaller clusters.
Different clustering cost per point for different sensitive groups can lead to fairness-related harms.
We propose a unified framework to solve socially fair center-based clustering and linear subspace clustering.
arXiv Detail & Related papers (2022-08-22T07:10:17Z) - Fair Labeled Clustering [28.297893914525517]
We consider the downstream application of clustering and how group fairness should be ensured for such a setting.
We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions.
We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
arXiv Detail & Related papers (2022-05-28T07:07:12Z) - Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group.
We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Fair Clustering Under a Bounded Cost [33.50262066253557]
Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space.
A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness.
We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective.
arXiv Detail & Related papers (2021-06-14T08:47:36Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Decorrelated Clustering with Data Selection Bias [55.91842043124102]
We propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias.
Our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias.
arXiv Detail & Related papers (2020-06-29T08:55:50Z) - Fair Algorithms for Hierarchical Agglomerative Clustering [17.66340013352806]
Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science.
It is imperative to ensure that these algorithms are fair -- even if the dataset contains biases against certain protected groups.
We propose fair algorithms for performing HAC that enforce fairness constraints.
arXiv Detail & Related papers (2020-05-07T01:41:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.