Related papers: The Fairness-Quality Trade-off in Clustering

The Fairness-Quality Trade-off in Clustering

URL: http://arxiv.org/abs/2408.10002v1
Date: Mon, 19 Aug 2024 13:57:15 GMT
Title: The Fairness-Quality Trade-off in Clustering
Authors: Rashida Hakim, Ana-Andreea Stoica, Christos H. Papadimitriou, Mihalis Yannakakis,
Abstract summary: We introduce novel algorithms for tracing the complete trade-off curve between quality and fairness in clustering problems. We deal with all objectives for fairness and quality in two general classes encompassing most of special cases addressed in previous work. We also present a new most natural-time objective: minimizing the sum, over all clusters, of the imbalance between the two groups in each cluster.
Score: 2.797813058568041
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Fairness in clustering has been considered extensively in the past; however, the trade-off between the two objectives -- e.g., can we sacrifice just a little in the quality of the clustering to significantly increase fairness, or vice-versa? -- has rarely been addressed. We introduce novel algorithms for tracing the complete trade-off curve, or Pareto front, between quality and fairness in clustering problems; that is, computing all clusterings that are not dominated in both objectives by other clusterings. Unlike previous work that deals with specific objectives for quality and fairness, we deal with all objectives for fairness and quality in two general classes encompassing most of the special cases addressed in previous work. Our algorithm must take exponential time in the worst case as the Pareto front itself can be exponential. Even when the Pareto front is polynomial, our algorithm may take exponential time, and we prove that this is inevitable unless P = NP. However, we also present a new polynomial-time algorithm for computing the entire Pareto front when the cluster centers are fixed, and for perhaps the most natural fairness objective: minimizing the sum, over all clusters, of the imbalance between the two groups in each cluster.

Related papers

Towards Fair Representation: Clustering and Consensus [1.7243216387069678]
We find a consensus clustering that is not only representative but also fair with respect to specific protected attributes.<n>As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness.<n>We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms.
arXiv Detail & Related papers (2025-06-10T10:33:21Z)
Accelerating Spectral Clustering under Fairness Constraints [56.865810822418744]
We present a new efficient method for fair spectral clustering (Fair SC) by casting the Fair SC problem within the difference of convex functions (DC) framework.<n>We show that each associated subproblem can be solved efficiently, resulting in higher computational efficiency compared to prior work.
arXiv Detail & Related papers (2025-06-09T18:46:27Z)
Doubly Constrained Fair Clustering [11.11449872883085]
Group Fairness (GF) and Diversity in Center Selection (DS) are two most prominent demographic representation fairness notions in clustering. We show that given a constant approximation algorithm for one constraint (GF or DS only) we can obtain a constant approximation solution that satisfies both constraints simultaneously. We show that both GF and DS are incompatible with a collection of other distance-based fairness notions.
arXiv Detail & Related papers (2023-05-31T01:04:55Z)
Proportionally Representative Clustering [17.5359577544947]
We propose a new axiom proportionally representative fairness'' (PRF) that is designed for clustering problems. Our fairness concept is not satisfied by existing fair clustering algorithms. Our algorithm for the unconstrained setting is also the first known-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom.
arXiv Detail & Related papers (2023-04-27T02:01:24Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Feature-based Individual Fairness in k-Clustering [14.847868952138795]
We consider the problem of clustering a set of points while ensuring fairness constraints. We introduce a new notion of individual fairness in k-clustering based on features that are not necessarily used for clustering.
arXiv Detail & Related papers (2021-09-09T20:42:02Z)
Efficient Algorithms For Fair Clustering with a New Fairness Notion [5.21410307583181]
We revisit the problem of fair clustering, first introduced by Chierichetti et al. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objective and fairness. We propose a new notion of fairness, which we call $tau$-fair fairness, that strictly generalizes the balance property and enables a fine-grained efficiency vs. fairness trade-off.
arXiv Detail & Related papers (2021-09-02T04:52:49Z)
Correlation Clustering Reconstruction in Semi-Adversarial Models [70.11015369368272]
Correlation Clustering is an important clustering problem with many applications. We study the reconstruction version of this problem in which one is seeking to reconstruct a latent clustering corrupted by random noise and adversarial modifications.
arXiv Detail & Related papers (2021-08-10T14:46:17Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
A Pairwise Fair and Community-preserving Approach to k-Center Clustering [34.386585230600716]
Clustering is a foundational problem in machine learning with numerous applications. We define two new types of fairness in the clustering setting, pairwise fairness and community preservation.
arXiv Detail & Related papers (2020-07-14T22:32:27Z)
Fair Hierarchical Clustering [92.03780518164108]
We define a notion of fairness that mitigates over-representation in traditional clustering. We show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
arXiv Detail & Related papers (2020-06-18T01:05:11Z)
Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.