Fair Clustering Under a Bounded Cost
- URL: http://arxiv.org/abs/2106.07239v1
- Date: Mon, 14 Jun 2021 08:47:36 GMT
- Title: Fair Clustering Under a Bounded Cost
- Authors: Seyed A. Esmaeili, Brian Brubach, Aravind Srinivasan, John P.
Dickerson
- Abstract summary: Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space.
A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness.
We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective.
- Score: 33.50262066253557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is a fundamental unsupervised learning problem where a dataset is
partitioned into clusters that consist of nearby points in a metric space. A
recent variant, fair clustering, associates a color with each point
representing its group membership and requires that each color has
(approximately) equal representation in each cluster to satisfy group fairness.
In this model, the cost of the clustering objective increases due to enforcing
fairness in the algorithm. The relative increase in the cost, the ''price of
fairness,'' can indeed be unbounded. Therefore, in this paper we propose to
treat an upper bound on the clustering objective as a constraint on the
clustering problem, and to maximize equality of representation subject to it.
We consider two fairness objectives: the group utilitarian objective and the
group egalitarian objective, as well as the group leximin objective which
generalizes the group egalitarian objective. We derive fundamental lower bounds
on the approximation of the utilitarian and egalitarian objectives and
introduce algorithms with provable guarantees for them. For the leximin
objective we introduce an effective heuristic algorithm. We further derive
impossibility results for other natural fairness objectives. We conclude with
experimental results on real-world datasets that demonstrate the validity of
our algorithms.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Fair Labeled Clustering [28.297893914525517]
We consider the downstream application of clustering and how group fairness should be ensured for such a setting.
We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions.
We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
arXiv Detail & Related papers (2022-05-28T07:07:12Z) - Feature-based Individual Fairness in k-Clustering [14.847868952138795]
We consider the problem of clustering a set of points while ensuring fairness constraints.
We introduce a new notion of individual fairness in k-clustering based on features that are not necessarily used for clustering.
arXiv Detail & Related papers (2021-09-09T20:42:02Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Deep Fair Discriminative Clustering [24.237000220172906]
We study a general notion of group-level fairness for binary and multi-state protected status variables (PSVs)
We propose a refinement learning algorithm to combine the clustering goal with the fairness objective to learn fair clusters adaptively.
Our framework shows promising results for novel clustering tasks including flexible fairness constraints, multi-state PSVs and predictive clustering.
arXiv Detail & Related papers (2021-05-28T23:50:48Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Probabilistic Fair Clustering [31.628993679745292]
Prior work in fair clustering assumes complete knowledge of group membership.
We present clustering algorithms in this more general setting with approximation ratio guarantees.
We also address the problem of "metric membership", where different groups have a notion of order and distance.
arXiv Detail & Related papers (2020-06-19T01:34:21Z) - Fair Hierarchical Clustering [92.03780518164108]
We define a notion of fairness that mitigates over-representation in traditional clustering.
We show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
arXiv Detail & Related papers (2020-06-18T01:05:11Z) - A Notion of Individual Fairness for Clustering [22.288902523866867]
A common distinction in fair machine learning, in particular in fair classification, is between group fairness and individual fairness.
In this paper, we propose a natural notion of individual fairness for clustering. Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster.
arXiv Detail & Related papers (2020-06-08T21:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.