A Generic Framework for Fair Consensus Clustering in Streams
- URL: http://arxiv.org/abs/2602.11500v1
- Date: Thu, 12 Feb 2026 02:52:07 GMT
- Title: A Generic Framework for Fair Consensus Clustering in Streams
- Authors: Diptarka Chakraborty, Kushagra Chatterjee, Debarati Das, Tien-Long Nguyen,
- Abstract summary: We introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting.<n>We extend our methods to the more general k-median consensus clustering problem.
- Score: 1.6398837165722515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.
Related papers
- One-Shot Hierarchical Federated Clustering [51.490181220883905]
This paper introduces an efficient one-shot hierarchical Federated Clustering framework.<n>It performs client-end distribution exploration and server-end distribution aggregation.<n>It turns out that the complex cluster distributions across clients can be efficiently explored.
arXiv Detail & Related papers (2026-01-10T02:58:33Z) - Federated Multi-Task Clustering [44.73672172790804]
This paper proposes a novel framework named Federated Multi-Task Clustering (i.e.,FMTC)<n>It is composed of two main components: client-side personalized clustering module and server-side tensorial correlation module.<n>We derive an efficient, privacy-preserving distributed algorithm based on the Alternating Direction Method of Multipliers.
arXiv Detail & Related papers (2025-12-28T12:02:32Z) - A General Anchor-Based Framework for Scalable Fair Clustering [20.252573532319875]
We introduce the Anchor-based Fair Clustering Framework (AFCF)<n>AFCF empowers arbitrary fair clustering algorithms with linear-time scalability.<n>We prove theoretically that the fairness of the final clustering on the entire dataset matches that of the anchor clustering.
arXiv Detail & Related papers (2025-11-13T02:50:06Z) - Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version) [50.41628860536753]
We propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax.<n>Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process.
arXiv Detail & Related papers (2025-11-12T11:17:17Z) - Towards Fair Representation: Clustering and Consensus [1.7243216387069678]
We find a consensus clustering that is not only representative but also fair with respect to specific protected attributes.<n>As part of our investigation, we examine how to minimally modify an existing clustering to enforce fairness.<n>We develop an optimal algorithm for datasets with equal group representation and near-linear time constant factor approximation algorithms.
arXiv Detail & Related papers (2025-06-10T10:33:21Z) - Fair Clustering via Alignment [12.12426896501947]
Algorithmic fairness in clustering aims to balance proportions of instances assigned to each cluster with respect to a given sensitive attribute.<n>We propose a new fair clustering algorithm based on a novel decomposition of the fair $K$-means clustering objective function.
arXiv Detail & Related papers (2025-05-14T04:29:09Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [85.51611950757643]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.<n>IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Cluster-level Group Representativity Fairness in $k$-means Clustering [3.420467786581458]
Clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters.
We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms.
We show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.
arXiv Detail & Related papers (2022-12-29T22:02:28Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.