Related papers: Fair Minimum Representation Clustering via Integer Programming

Fair Minimum Representation Clustering via Integer Programming

URL: http://arxiv.org/abs/2409.02963v1
Date: Wed, 4 Sep 2024 00:13:40 GMT
Title: Fair Minimum Representation Clustering via Integer Programming
Authors: Connor Lawless, Oktay Gunluk,
Abstract summary: Clustering is an unsupervised learning task that aims to partition data into a set of clusters. In this paper, we study the k-means and k-medians clustering problems with the additional constraint that each group must have a minimum level of representation. We present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints.
Score: 0.6906005491572401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clustering is an unsupervised learning task that aims to partition data into a set of clusters. In many applications, these clusters correspond to real-world constructs (e.g., electoral districts, playlists, TV channels) whose benefit can only be attained by groups when they reach a minimum level of representation (e.g., 50\% to elect their desired candidate). In this paper, we study the k-means and k-medians clustering problems with the additional constraint that each group (e.g., demographic group) must have a minimum level of representation in at least a given number of clusters. We formulate the problem through a mixed-integer optimization framework and present an alternating minimization algorithm, called MiniReL, that directly incorporates the fairness constraints. While incorporating the fairness criteria leads to an NP-Hard assignment problem within the algorithm, we provide computational approaches that make the algorithm practical even for large datasets. Numerical results show that the approach is able to create fairer clusters with practically no increase in the clustering cost across standard benchmark datasets.

Related papers

Fair Model-based Clustering [11.871560374559566]
We propose a new fair clustering algorithm based on a finite mixture model, called Fair Model-based Clustering (FMC)<n>A main advantage of FMC is that the number of learnable parameters is independent of the sample size and thus can be scaled up easily.<n>FMC can be applied to non-metric data as long as the likelihood is well-defined.
arXiv Detail & Related papers (2026-02-25T02:41:16Z)
Towards Fair Representation: Clustering and Consensus [1.7243216387069678]
We find a consensus clustering that is not only representative but also fair with respect to specific protected attributes.<n>As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness.<n>We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms.
arXiv Detail & Related papers (2025-06-10T10:33:21Z)
A Computational Approach to Improving Fairness in K-means Clustering [8.001963712764569]
The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation.<n>This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation.<n> Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality.
arXiv Detail & Related papers (2025-05-29T01:48:12Z)
From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection [12.993073967843292]
We study a problem in a semi-supervised setting, with an unknown ground-truth clustering. We introduce a notion of size generalization for clustering algorithm accuracy. We use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
arXiv Detail & Related papers (2024-02-22T06:53:35Z)
CueGCL: Cluster-aware Personalized Self-Training for Unsupervised Graph Contrastive Learning [49.88192702588169]
We propose a Cluster-aware Graph Contrastive Learning Framework (CueGCL) to jointly learn clustering results and node representations.<n>Specifically, we design a personalized self-training (PeST) strategy for unsupervised scenarios, which enables our model to capture precise cluster-level personalized information.<n>We theoretically demonstrate the effectiveness of our model, showing it yields an embedding space with a significantly discernible cluster structure.
arXiv Detail & Related papers (2023-11-18T13:45:21Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Fair Minimum Representation Clustering [0.0]
Clustering is an unsupervised learning task that aims to partition data into a set of clusters. We show that the popular $k$-means algorithm, Lloyd's algorithm, can result in unfair outcomes. We present a variant of Lloyd's algorithm, called MiniReL, that directly incorporates the fairness constraints.
arXiv Detail & Related papers (2023-02-06T23:16:38Z)
Socially Fair Center-based and Linear Subspace Clustering [8.355270405285909]
Center-based clustering and linear subspace clustering are popular techniques to partition real-world data into smaller clusters. Different clustering cost per point for different sensitive groups can lead to fairness-related harms. We propose a unified framework to solve socially fair center-based clustering and linear subspace clustering.
arXiv Detail & Related papers (2022-08-22T07:10:17Z)
Fair Labeled Clustering [28.297893914525517]
We consider the downstream application of clustering and how group fairness should be ensured for such a setting. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
arXiv Detail & Related papers (2022-05-28T07:07:12Z)
Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. We provide implementable differentially private clustering algorithms that provide utility when the data is "easy" We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity Constraints [0.0]
equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold. We review and evaluate suitable clustering algorithms to identify trade-offs between the various practical solutions for this clustering problem.
arXiv Detail & Related papers (2021-09-28T12:02:18Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.