A General Anchor-Based Framework for Scalable Fair Clustering
- URL: http://arxiv.org/abs/2511.09889v1
- Date: Fri, 14 Nov 2025 01:16:50 GMT
- Title: A General Anchor-Based Framework for Scalable Fair Clustering
- Authors: Shengfei Wei, Suyuan Liu, Jun Wang, Ke Liang, Miaomiao Li, Lei Luo,
- Abstract summary: We introduce the Anchor-based Fair Clustering Framework (AFCF)<n>AFCF empowers arbitrary fair clustering algorithms with linear-time scalability.<n>We prove theoretically that the fairness of the final clustering on the entire dataset matches that of the anchor clustering.
- Score: 20.252573532319875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fair clustering is crucial for mitigating bias in unsupervised learning, yet existing algorithms often suffer from quadratic or super-quadratic computational complexity, rendering them impractical for large-scale datasets. To bridge this gap, we introduce the Anchor-based Fair Clustering Framework (AFCF), a novel, general, and plug-and-play framework that empowers arbitrary fair clustering algorithms with linear-time scalability. Our approach first selects a small but representative set of anchors using a novel fair sampling strategy. Then, any off-the-shelf fair clustering algorithm can be applied to this small anchor set. The core of our framework lies in a novel anchor graph construction module, where we formulate an optimization problem to propagate labels while preserving fairness. This is achieved through a carefully designed group-label joint constraint, which we prove theoretically ensures that the fairness of the final clustering on the entire dataset matches that of the anchor clustering. We solve this optimization efficiently using an ADMM-based algorithm. Extensive experiments on multiple large-scale benchmarks demonstrate that AFCF drastically accelerates state-of-the-art methods, which reduces computational time by orders of magnitude while maintaining strong clustering performance and fairness guarantees.
Related papers
- Fair Model-based Clustering [11.871560374559566]
We propose a new fair clustering algorithm based on a finite mixture model, called Fair Model-based Clustering (FMC)<n>A main advantage of FMC is that the number of learnable parameters is independent of the sample size and thus can be scaled up easily.<n>FMC can be applied to non-metric data as long as the likelihood is well-defined.
arXiv Detail & Related papers (2026-02-25T02:41:16Z) - A Generic Framework for Fair Consensus Clustering in Streams [1.6398837165722515]
We introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting.<n>We extend our methods to the more general k-median consensus clustering problem.
arXiv Detail & Related papers (2026-02-12T02:52:07Z) - Self-Enhanced Image Clustering with Cross-Modal Semantic Consistency [57.961869351897384]
We propose a framework based on cross-modal semantic consistency for efficient image clustering.<n>Our framework first builds a strong foundation via Cross-Modal Semantic Consistency.<n>In the first stage, we train lightweight clustering heads to align with the rich semantics of the pre-trained model.<n>In the second stage, we introduce a Self-Enhanced fine-tuning strategy.
arXiv Detail & Related papers (2025-08-02T08:12:57Z) - Accelerating Spectral Clustering under Fairness Constraints [56.865810822418744]
We present a new efficient method for fair spectral clustering (Fair SC) by casting the Fair SC problem within the difference of convex functions (DC) framework.<n>We show that each associated subproblem can be solved efficiently, resulting in higher computational efficiency compared to prior work.
arXiv Detail & Related papers (2025-06-09T18:46:27Z) - Fair Clustering via Alignment [12.12426896501947]
Algorithmic fairness in clustering aims to balance proportions of instances assigned to each cluster with respect to a given sensitive attribute.<n>We propose a new fair clustering algorithm based on a novel decomposition of the fair $K$-means clustering objective function.
arXiv Detail & Related papers (2025-05-14T04:29:09Z) - Towards Learnable Anchor for Deep Multi-View Clustering [49.767879678193005]
In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time.<n>With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering.<n>Experiments on several datasets demonstrate superior performance and efficiency of DMAC compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-03-16T09:38:11Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [85.51611950757643]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.<n>IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Proportionally Representative Clustering [17.5359577544947]
We propose a new axiom proportionally representative fairness'' (PRF) that is designed for clustering problems.
Our fairness concept is not satisfied by existing fair clustering algorithms.
Our algorithm for the unconstrained setting is also the first known-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom.
arXiv Detail & Related papers (2023-04-27T02:01:24Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Deep Fair Discriminative Clustering [24.237000220172906]
We study a general notion of group-level fairness for binary and multi-state protected status variables (PSVs)
We propose a refinement learning algorithm to combine the clustering goal with the fairness objective to learn fair clusters adaptively.
Our framework shows promising results for novel clustering tasks including flexible fairness constraints, multi-state PSVs and predictive clustering.
arXiv Detail & Related papers (2021-05-28T23:50:48Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.