Related papers: Fair-Capacitated Clustering

Fair-Capacitated Clustering

URL: http://arxiv.org/abs/2104.12116v2
Date: Wed, 28 Apr 2021 07:24:28 GMT
Title: Fair-Capacitated Clustering
Authors: Tai Le Quy, Arjun Roy, Gunnar Friege and Eirini Ntoutsi
Abstract summary: We introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances. We propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality.
Score: 5.127121704630949
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

Related papers

Towards Fair Representation: Clustering and Consensus [1.7243216387069678]
We find a consensus clustering that is not only representative but also fair with respect to specific protected attributes.<n>As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness.<n>We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms.
arXiv Detail & Related papers (2025-06-10T10:33:21Z)
Fair Clustering with Clusterlets [5.8010446129208155]
Given a set of small and fair clusters, a trivial centroid-based clustering algorithm yields a fair clustering.<n>Finding a suitable starting clustering can be computationally expensive, rather complex or arbitrary.<n>We propose a set of simple emphclusterlet-based fuzzy clustering algorithms that match single-class clusters, optimizing fair clustering.
arXiv Detail & Related papers (2025-05-03T17:00:54Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Fair Minimum Representation Clustering via Integer Programming [0.6906005491572401]
Clustering is an unsupervised learning task that aims to partition data into a set of clusters. In this paper, we study the k-means and k-medians clustering problems with the additional constraint that each group must have a minimum level of representation. We present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints.
arXiv Detail & Related papers (2024-09-04T00:13:40Z)
A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm. In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Cluster-level Group Representativity Fairness in $k$-means Clustering [3.420467786581458]
Clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters. We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms. We show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.
arXiv Detail & Related papers (2022-12-29T22:02:28Z)
Socially Fair Center-based and Linear Subspace Clustering [8.355270405285909]
Center-based clustering and linear subspace clustering are popular techniques to partition real-world data into smaller clusters. Different clustering cost per point for different sensitive groups can lead to fairness-related harms. We propose a unified framework to solve socially fair center-based clustering and linear subspace clustering.
arXiv Detail & Related papers (2022-08-22T07:10:17Z)
Fair Labeled Clustering [28.297893914525517]
We consider the downstream application of clustering and how group fairness should be ensured for such a setting. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
arXiv Detail & Related papers (2022-05-28T07:07:12Z)
Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity Constraints [0.0]
equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold. We review and evaluate suitable clustering algorithms to identify trade-offs between the various practical solutions for this clustering problem.
arXiv Detail & Related papers (2021-09-28T12:02:18Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method. Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features. On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Fair Hierarchical Clustering [92.03780518164108]
We define a notion of fairness that mitigates over-representation in traditional clustering. We show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
arXiv Detail & Related papers (2020-06-18T01:05:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.