Fair Minimum Representation Clustering
- URL: http://arxiv.org/abs/2302.03151v2
- Date: Wed, 8 Feb 2023 16:00:33 GMT
- Title: Fair Minimum Representation Clustering
- Authors: Connor Lawless, Oktay Gunluk
- Abstract summary: Clustering is an unsupervised learning task that aims to partition data into a set of clusters.
We show that the popular $k$-means algorithm, Lloyd's algorithm, can result in unfair outcomes.
We present a variant of Lloyd's algorithm, called MiniReL, that directly incorporates the fairness constraints.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is an unsupervised learning task that aims to partition data into
a set of clusters. In many applications, these clusters correspond to
real-world constructs (e.g. electoral districts) whose benefit can only be
attained by groups when they reach a minimum level of representation (e.g. 50\%
to elect their desired candidate). This paper considers the problem of
performing k-means clustering while ensuring groups (e.g. demographic groups)
have that minimum level of representation in a specified number of clusters. We
show that the popular $k$-means algorithm, Lloyd's algorithm, can result in
unfair outcomes where certain groups lack sufficient representation past the
minimum threshold in a proportional number of clusters. We formulate the
problem through a mixed-integer optimization framework and present a variant of
Lloyd's algorithm, called MiniReL, that directly incorporates the fairness
constraints. We show that incorporating the fairness criteria leads to a
NP-Hard sub-problem within Lloyd's algorithm, but we provide computational
approaches that make the problem tractable for even large datasets. Numerical
results show that the approach is able to create fairer clusters with
practically no increase in the k-means clustering cost across standard
benchmark datasets.
Related papers
- Fair Minimum Representation Clustering via Integer Programming [0.6906005491572401]
Clustering is an unsupervised learning task that aims to partition data into a set of clusters.
In this paper, we study the k-means and k-medians clustering problems with the additional constraint that each group must have a minimum level of representation.
We present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints.
arXiv Detail & Related papers (2024-09-04T00:13:40Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.
IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Cluster-level Group Representativity Fairness in $k$-means Clustering [3.420467786581458]
Clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters.
We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms.
We show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.
arXiv Detail & Related papers (2022-12-29T22:02:28Z) - Socially Fair Center-based and Linear Subspace Clustering [8.355270405285909]
Center-based clustering and linear subspace clustering are popular techniques to partition real-world data into smaller clusters.
Different clustering cost per point for different sensitive groups can lead to fairness-related harms.
We propose a unified framework to solve socially fair center-based clustering and linear subspace clustering.
arXiv Detail & Related papers (2022-08-22T07:10:17Z) - Fair Labeled Clustering [28.297893914525517]
We consider the downstream application of clustering and how group fairness should be ensured for such a setting.
We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions.
We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
arXiv Detail & Related papers (2022-05-28T07:07:12Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Socially Fair k-Means Clustering [3.3409719900340256]
We present a fair k-means objective and algorithm to choose cluster centers that provide equitable costs for different groups.
The algorithm, Fair-Lloyd, is a modification of Lloyd's for k-means, inheriting its simplicity, efficiency, and stability.
arXiv Detail & Related papers (2020-06-17T18:05:17Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.