Fair Clustering Using Antidote Data
- URL: http://arxiv.org/abs/2106.00600v1
- Date: Tue, 1 Jun 2021 16:07:52 GMT
- Title: Fair Clustering Using Antidote Data
- Authors: Anshuman Chhabra, Adish Singla, Prasant Mohapatra
- Abstract summary: We propose an alternate approach to fairness in clustering where we augment the original dataset with a small number of data points, called antidote data.
Our algorithms achieve lower fairness costs and competitive clustering performance compared to other state-of-the-art fair clustering algorithms.
- Score: 35.40427659749882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering algorithms are widely utilized for many modern data science
applications. This motivates the need to make outputs of clustering algorithms
fair. Traditionally, new fair algorithmic variants to clustering algorithms are
developed for specific notions of fairness. However, depending on the
application context, different definitions of fairness might need to be
employed. As a result, new algorithms and analysis need to be proposed for each
combination of clustering algorithm and fairness definition. Additionally, each
new algorithm would need to be reimplemented for deployment in a real-world
system. Hence, we propose an alternate approach to fairness in clustering where
we augment the original dataset with a small number of data points, called
antidote data. When clustering is undertaken on this new dataset, the output is
fair, for the chosen clustering algorithm and fairness definition. We formulate
this as a general bi-level optimization problem which can accommodate any
center-based clustering algorithms and fairness notions. We then categorize
approaches for solving this bi-level optimization for different problem
settings. Extensive experiments on different clustering algorithms and fairness
notions show that our algorithms can achieve desired levels of fairness on many
real-world datasets with a very small percentage of antidote data added. We
also find that our algorithms achieve lower fairness costs and competitive
clustering performance compared to other state-of-the-art fair clustering
algorithms.
Related papers
- From Large to Small Datasets: Size Generalization for Clustering
Algorithm Selection [12.993073967843292]
We study a problem in a semi-supervised setting, with an unknown ground-truth clustering.
We introduce a notion of size generalization for clustering algorithm accuracy.
We use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
arXiv Detail & Related papers (2024-02-22T06:53:35Z) - A Rapid Review of Clustering Algorithms [5.46715422237599]
Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data.
They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media.
We analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions.
arXiv Detail & Related papers (2024-01-14T23:19:53Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Fairness Degrading Adversarial Attacks Against Clustering Algorithms [35.40427659749882]
We propose a fairness degrading attack algorithm for k-median clustering.
We find that the addition of the generated adversarial samples can lead to significantly lower fairness values.
arXiv Detail & Related papers (2021-10-22T19:10:27Z) - Efficient Algorithms For Fair Clustering with a New Fairness Notion [5.21410307583181]
We revisit the problem of fair clustering, first introduced by Chierichetti et al.
Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objective and fairness.
We propose a new notion of fairness, which we call $tau$-fair fairness, that strictly generalizes the balance property and enables a fine-grained efficiency vs. fairness trade-off.
arXiv Detail & Related papers (2021-09-02T04:52:49Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Fair Hierarchical Clustering [92.03780518164108]
We define a notion of fairness that mitigates over-representation in traditional clustering.
We show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
arXiv Detail & Related papers (2020-06-18T01:05:11Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - Fair Algorithms for Hierarchical Agglomerative Clustering [17.66340013352806]
Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science.
It is imperative to ensure that these algorithms are fair -- even if the dataset contains biases against certain protected groups.
We propose fair algorithms for performing HAC that enforce fairness constraints.
arXiv Detail & Related papers (2020-05-07T01:41:56Z) - Fair Correlation Clustering [92.15492066925977]
We obtain approximation algorithms for correlation clustering under several important types of fairness constraints.
We show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.
arXiv Detail & Related papers (2020-02-06T14:28:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.