Fair Model-based Clustering
- URL: http://arxiv.org/abs/2602.21509v1
- Date: Wed, 25 Feb 2026 02:41:16 GMT
- Title: Fair Model-based Clustering
- Authors: Jinwon Park, Kunwoong Kim, Jihu Lee, Yongdai Kim,
- Abstract summary: We propose a new fair clustering algorithm based on a finite mixture model, called Fair Model-based Clustering (FMC)<n>A main advantage of FMC is that the number of learnable parameters is independent of the sample size and thus can be scaled up easily.<n>FMC can be applied to non-metric data as long as the likelihood is well-defined.
- Score: 11.871560374559566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of fair clustering is to find clusters such that the proportion of sensitive attributes (e.g., gender, race, etc.) in each cluster is similar to that of the entire dataset. Various fair clustering algorithms have been proposed that modify standard K-means clustering to satisfy a given fairness constraint. A critical limitation of several existing fair clustering algorithms is that the number of parameters to be learned is proportional to the sample size because the cluster assignment of each datum should be optimized simultaneously with the cluster center, and thus scaling up the algorithms is difficult. In this paper, we propose a new fair clustering algorithm based on a finite mixture model, called Fair Model-based Clustering (FMC). A main advantage of FMC is that the number of learnable parameters is independent of the sample size and thus can be scaled up easily. In particular, mini-batch learning is possible to obtain clusters that are approximately fair. Moreover, FMC can be applied to non-metric data (e.g., categorical data) as long as the likelihood is well-defined. Theoretical and empirical justifications for the superiority of the proposed algorithm are provided.
Related papers
- Fair Bayesian Model-Based Clustering [3.1911375902105386]
Group fairness ensures that the proportions of each sensitive group are similar in all clusters.<n>Most existing group-fair clustering methods are based on the $K$-means clustering.<n>We propose a fair Bayesian model-based clustering called Fair Bayesian Clustering.
arXiv Detail & Related papers (2025-06-15T13:16:32Z) - A Computational Approach to Improving Fairness in K-means Clustering [8.001963712764569]
The popular K-means clustering algorithm potentially suffers from a major weakness for further analysis or interpretation.<n>This work attempts to improve the fairness of K-means clustering with a two-stage optimization formulation.<n> Experiments on benchmark datasets show substantial improvement on fairness with a minimal impact to clustering quality.
arXiv Detail & Related papers (2025-05-29T01:48:12Z) - K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z) - Fair Clustering via Alignment [12.12426896501947]
Algorithmic fairness in clustering aims to balance proportions of instances assigned to each cluster with respect to a given sensitive attribute.<n>We propose a new fair clustering algorithm based on a novel decomposition of the fair $K$-means clustering objective function.
arXiv Detail & Related papers (2025-05-14T04:29:09Z) - Fair Clustering with Clusterlets [5.8010446129208155]
Given a set of small and fair clusters, a trivial centroid-based clustering algorithm yields a fair clustering.<n>Finding a suitable starting clustering can be computationally expensive, rather complex or arbitrary.<n>We propose a set of simple emphclusterlet-based fuzzy clustering algorithms that match single-class clusters, optimizing fair clustering.
arXiv Detail & Related papers (2025-05-03T17:00:54Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [85.51611950757643]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.<n>IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Deep Fair Discriminative Clustering [24.237000220172906]
We study a general notion of group-level fairness for binary and multi-state protected status variables (PSVs)
We propose a refinement learning algorithm to combine the clustering goal with the fairness objective to learn fair clusters adaptively.
Our framework shows promising results for novel clustering tasks including flexible fairness constraints, multi-state PSVs and predictive clustering.
arXiv Detail & Related papers (2021-05-28T23:50:48Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.