Related papers: Fair Clustering Using Antidote Data

Fair Clustering Using Antidote Data

URL: http://arxiv.org/abs/2106.00600v1
Date: Tue, 1 Jun 2021 16:07:52 GMT
Title: Fair Clustering Using Antidote Data
Authors: Anshuman Chhabra, Adish Singla, Prasant Mohapatra
Abstract summary: We propose an alternate approach to fairness in clustering where we augment the original dataset with a small number of data points, called antidote data. Our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.
Score: 35.40427659749882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clustering algorithms are widely utilized for many modern data science applications. This motivates the need to make outputs of clustering algorithms fair. Traditionally, new fair algorithmic variants to clustering algorithms are developed for specific notions of fairness. However, depending on the application context, different definitions of fairness might need to be employed. As a result, new algorithms and analysis need to be proposed for each combination of clustering algorithm and fairness definition. Additionally, each new algorithm would need to be reimplemented for deployment in a real-world system. Hence, we propose an alternate approach to fairness in clustering where we augment the original dataset with a small number of data points, called antidote data. When clustering is undertaken on this new dataset, the output is fair, for the chosen clustering algorithm and fairness definition. We formulate this as a general bi-level optimization problem which can accommodate any center-based clustering algorithms and fairness notions. We then categorize approaches for solving this bi-level optimization for different problem settings. Extensive experiments on different clustering algorithms and fairness notions show that our algorithms can achieve desired levels of fairness on many real-world datasets with a very small percentage of antidote data added. We also find that our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.

Related papers

Fair Model-based Clustering [11.871560374559566]
We propose a new fair clustering algorithm based on a finite mixture model, called Fair Model-based Clustering (FMC)<n>A main advantage of FMC is that the number of learnable parameters is independent of the sample size and thus can be scaled up easily.<n>FMC can be applied to non-metric data as long as the likelihood is well-defined.
arXiv Detail & Related papers (2026-02-25T02:41:16Z)
Accelerating Spectral Clustering under Fairness Constraints [56.865810822418744]
We present a new efficient method for fair spectral clustering (Fair SC) by casting the Fair SC problem within the difference of convex functions (DC) framework.<n>We show that each associated subproblem can be solved efficiently, resulting in higher computational efficiency compared to prior work.
arXiv Detail & Related papers (2025-06-09T18:46:27Z)
Fair Clustering via Alignment [3.5845787949988592]
Algorithmic fairness in clustering aims to balance proportions of instances assigned to each cluster with respect to a given sensitive attribute.<n>We propose a new fair clustering algorithm based on a novel decomposition of the fair $K$-means clustering objective function.
arXiv Detail & Related papers (2025-05-14T04:29:09Z)
From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection [12.993073967843292]
We study a problem in a semi-supervised setting, with an unknown ground-truth clustering. We introduce a notion of size generalization for clustering algorithm accuracy. We use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
arXiv Detail & Related papers (2024-02-22T06:53:35Z)
A Rapid Review of Clustering Algorithms [5.46715422237599]
Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media. We analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions.
arXiv Detail & Related papers (2024-01-14T23:19:53Z)
Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. We provide implementable differentially private clustering algorithms that provide utility when the data is "easy" We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z)
Fairness Degrading Adversarial Attacks Against Clustering Algorithms [35.40427659749882]
We propose a fairness degrading attack algorithm for k-median clustering. We find that the addition of the generated adversarial samples can lead to significantly lower fairness values.
arXiv Detail & Related papers (2021-10-22T19:10:27Z)
Efficient Algorithms For Fair Clustering with a New Fairness Notion [5.21410307583181]
We revisit the problem of fair clustering, first introduced by Chierichetti et al. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objective and fairness. We propose a new notion of fairness, which we call $tau$-fair fairness, that strictly generalizes the balance property and enables a fine-grained efficiency vs. fairness trade-off.
arXiv Detail & Related papers (2021-09-02T04:52:49Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
Fair Hierarchical Clustering [92.03780518164108]
We define a notion of fairness that mitigates over-representation in traditional clustering. We show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
arXiv Detail & Related papers (2020-06-18T01:05:11Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
Fair Algorithms for Hierarchical Agglomerative Clustering [17.66340013352806]
Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science. It is imperative to ensure that these algorithms are fair -- even if the dataset contains biases against certain protected groups. We propose fair algorithms for performing HAC that enforce fairness constraints.
arXiv Detail & Related papers (2020-05-07T01:41:56Z)
Fair Correlation Clustering [92.15492066925977]
We obtain approximation algorithms for correlation clustering under several important types of fairness constraints. We show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.
arXiv Detail & Related papers (2020-02-06T14:28:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.