Fair Labeled Clustering
- URL: http://arxiv.org/abs/2205.14358v2
- Date: Sun, 4 Jun 2023 23:50:08 GMT
- Title: Fair Labeled Clustering
- Authors: Seyed A. Esmaeili, Sharmila Duppala, John P. Dickerson, Brian Brubach
- Abstract summary: We consider the downstream application of clustering and how group fairness should be ensured for such a setting.
We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions.
We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
- Score: 28.297893914525517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerous algorithms have been produced for the fundamental problem of
clustering under many different notions of fairness. Perhaps the most common
family of notions currently studied is group fairness, in which proportional
group representation is ensured in every cluster. We extend this direction by
considering the downstream application of clustering and how group fairness
should be ensured for such a setting. Specifically, we consider a common
setting in which a decision-maker runs a clustering algorithm, inspects the
center of each cluster, and decides an appropriate outcome (label) for its
corresponding cluster. In hiring for example, there could be two outcomes,
positive (hire) or negative (reject), and each cluster would be assigned one of
these two outcomes. To ensure group fairness in such a setting, we would desire
proportional group representation in every label but not necessarily in every
cluster as is done in group fair clustering. We provide algorithms for such
problems and show that in contrast to their NP-hard counterparts in group fair
clustering, they permit efficient solutions. We also consider a well-motivated
alternative setting where the decision-maker is free to assign labels to the
clusters regardless of the centers' positions in the metric space. We show that
this setting exhibits interesting transitions from computationally hard to easy
according to additional constraints on the problem. Moreover, when the
constraint parameters take on natural values we show a randomized algorithm for
this setting that always achieves an optimal clustering and satisfies the
fairness constraints in expectation. Finally, we run experiments on real world
datasets that validate the effectiveness of our algorithms.
Related papers
- ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - Robust Fair Clustering with Group Membership Uncertainty Sets [31.29773979737976]
We study the canonical fair clustering problem where each cluster is constrained to have close to population-level representation of each group.
We introduce a simple noise model that requires a small number of parameters to be given by the decision maker.
We present an algorithm for fair clustering with provable emphrobustness guarantees.
arXiv Detail & Related papers (2024-06-02T03:11:31Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Cluster-level Group Representativity Fairness in $k$-means Clustering [3.420467786581458]
Clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters.
We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms.
We show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.
arXiv Detail & Related papers (2022-12-29T22:02:28Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Fair-Capacitated Clustering [5.127121704630949]
We introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances.
We propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering.
Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality.
arXiv Detail & Related papers (2021-04-25T09:39:39Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Probabilistic Fair Clustering [31.628993679745292]
Prior work in fair clustering assumes complete knowledge of group membership.
We present clustering algorithms in this more general setting with approximation ratio guarantees.
We also address the problem of "metric membership", where different groups have a notion of order and distance.
arXiv Detail & Related papers (2020-06-19T01:34:21Z) - Fair Algorithms for Hierarchical Agglomerative Clustering [17.66340013352806]
Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science.
It is imperative to ensure that these algorithms are fair -- even if the dataset contains biases against certain protected groups.
We propose fair algorithms for performing HAC that enforce fairness constraints.
arXiv Detail & Related papers (2020-05-07T01:41:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.