Probabilistic Fair Clustering
- URL: http://arxiv.org/abs/2006.10916v3
- Date: Fri, 2 Jun 2023 20:04:45 GMT
- Title: Probabilistic Fair Clustering
- Authors: Seyed A. Esmaeili, Brian Brubach, Leonidas Tsepenekas, John P.
Dickerson
- Abstract summary: Prior work in fair clustering assumes complete knowledge of group membership.
We present clustering algorithms in this more general setting with approximation ratio guarantees.
We also address the problem of "metric membership", where different groups have a notion of order and distance.
- Score: 31.628993679745292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In clustering problems, a central decision-maker is given a complete metric
graph over vertices and must provide a clustering of vertices that minimizes
some objective function. In fair clustering problems, vertices are endowed with
a color (e.g., membership in a group), and the features of a valid clustering
might also include the representation of colors in that clustering. Prior work
in fair clustering assumes complete knowledge of group membership. In this
paper, we generalize prior work by assuming imperfect knowledge of group
membership through probabilistic assignments. We present clustering algorithms
in this more general setting with approximation ratio guarantees. We also
address the problem of "metric membership", where different groups have a
notion of order and distance. Experiments are conducted using our proposed
algorithms as well as baselines to validate our approach and also surface
nuanced concerns when group membership is not known deterministically.
Related papers
- Robust Fair Clustering with Group Membership Uncertainty Sets [31.29773979737976]
We study the canonical fair clustering problem where each cluster is constrained to have close to population-level representation of each group.
We introduce a simple noise model that requires a small number of parameters to be given by the decision maker.
We present an algorithm for fair clustering with provable emphrobustness guarantees.
arXiv Detail & Related papers (2024-06-02T03:11:31Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A Computational Theory and Semi-Supervised Algorithm for Clustering [0.0]
A semi-supervised clustering algorithm is presented.
The kernel of the clustering method is Mohammad's anomaly detection algorithm.
Results are presented on synthetic and realworld data sets.
arXiv Detail & Related papers (2023-06-12T09:15:58Z) - Cluster-level Group Representativity Fairness in $k$-means Clustering [3.420467786581458]
Clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters.
We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms.
We show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.
arXiv Detail & Related papers (2022-12-29T22:02:28Z) - Fair Labeled Clustering [28.297893914525517]
We consider the downstream application of clustering and how group fairness should be ensured for such a setting.
We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions.
We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
arXiv Detail & Related papers (2022-05-28T07:07:12Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Fair Clustering Under a Bounded Cost [33.50262066253557]
Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space.
A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness.
We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective.
arXiv Detail & Related papers (2021-06-14T08:47:36Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.