Fair Hierarchical Clustering
- URL: http://arxiv.org/abs/2006.10221v2
- Date: Fri, 19 Jun 2020 02:59:47 GMT
- Title: Fair Hierarchical Clustering
- Authors: Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad
Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang
- Abstract summary: We define a notion of fairness that mitigates over-representation in traditional clustering.
We show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
- Score: 92.03780518164108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As machine learning has become more prevalent, researchers have begun to
recognize the necessity of ensuring machine learning systems are fair.
Recently, there has been an interest in defining a notion of fairness that
mitigates over-representation in traditional clustering.
In this paper we extend this notion to hierarchical clustering, where the
goal is to recursively partition the data to optimize a specific objective. For
various natural objectives, we obtain simple, efficient algorithms to find a
provably good fair hierarchical clustering. Empirically, we show that our
algorithms can find a fair hierarchical clustering, with only a negligible loss
in the objective.
Related papers
- Fair Clustering: Critique, Caveats, and Future Directions [11.077625489695922]
Clustering is a fundamental problem in machine learning and operations research.
We take a critical view of fair clustering, identifying a collection of ignored issues.
arXiv Detail & Related papers (2024-06-22T23:34:53Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Semi-supervised learning made simple with self-supervised clustering [65.98152950607707]
Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations.
We propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods into semi-supervised learners.
arXiv Detail & Related papers (2023-06-13T01:09:18Z) - Fair Clustering via Hierarchical Fair-Dirichlet Process [8.85031165304586]
A popular notion of fairness in clustering mandates the clusters to be em balanced, i.e., each level of a protected attribute must be approximately equally represented in each cluster.
In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions.
arXiv Detail & Related papers (2023-05-27T19:16:55Z) - Fair Clustering Under a Bounded Cost [33.50262066253557]
Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space.
A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness.
We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective.
arXiv Detail & Related papers (2021-06-14T08:47:36Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Deep Fair Discriminative Clustering [24.237000220172906]
We study a general notion of group-level fairness for binary and multi-state protected status variables (PSVs)
We propose a refinement learning algorithm to combine the clustering goal with the fairness objective to learn fair clusters adaptively.
Our framework shows promising results for novel clustering tasks including flexible fairness constraints, multi-state PSVs and predictive clustering.
arXiv Detail & Related papers (2021-05-28T23:50:48Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Whither Fair Clustering? [3.4925763160992402]
We argue that the state-of-the-art in fair clustering has been quite parochial in outlook.
We argue that widening the normative principles to target for, characterizing shortfalls where the target cannot be achieved fully, and making use of knowledge of downstream processes can significantly widen the scope of research in fair clustering research.
arXiv Detail & Related papers (2020-07-08T19:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.