Related papers: Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems

Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems

URL: http://arxiv.org/abs/2411.01115v1
Date: Sat, 02 Nov 2024 02:50:12 GMT
Title: Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems
Authors: Shihong Song, Guanlin Mo, Qingyuan Yang, Hu Ding,
Abstract summary: Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group. We propose a novel Relax and Merge'' framework, where $rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm. If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(epsilon))$ with only a slight violation of the fairness constraints.
Score: 8.74967598360817
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The fairness of clustering algorithms has gained widespread attention across various areas, including machine learning, In this paper, we study fair $k$-means clustering in Euclidean space. Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group within specified lower and upper bounds. Due to these fairness constraints, determining the optimal locations of $k$ centers is a quite challenging task. We propose a novel ``Relax and Merge'' framework that returns a $(1+4\rho + O(\epsilon))$-approximate solution, where $\rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm and $O(\epsilon)$ can be an arbitrarily small positive number. If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(\epsilon))$ with only a slight violation of the fairness constraints, which improves the current state-of-the-art approximation guarantee. Furthermore, using our framework, we can also obtain a $(1+4\rho +O(\epsilon))$-approximate solution for the $k$-sparse Wasserstein Barycenter problem, which is a fundamental optimization problem in the field of optimal transport, and a $(2+6\rho)$-approximate solution for the strictly fair $k$-means clustering with no violation, both of which are better than the current state-of-the-art methods. In addition, the empirical results demonstrate that our proposed algorithm can significantly outperform baseline approaches in terms of clustering cost.

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights [16.120911591795295]
In some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a subset of points, referred as facilities or suppliers. In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group.
arXiv Detail & Related papers (2024-10-16T18:00:19Z)
A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$. We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z)
Diversity-aware clustering: Computational Complexity and Approximation Algorithms [19.67390261007849]
We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group. We present parameterized approximation algorithms with approximation ratios $1+ frac2e$, $1+frac8e$ and $3$.
arXiv Detail & Related papers (2024-01-10T19:01:05Z)
Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems. On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z)
Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner. We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z)
Scalable Differentially Private Clustering via Hierarchically Separated Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee. Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z)
Minimax Optimal Algorithms with Fixed-$k$-Nearest Neighbors [13.231906521852718]
We consider a distributed learning scenario in which a massive dataset is split into smaller groups. We propose emphoptimal rules to aggregate the fixed-$k$-NN information for classification, regression, and density estimation. We show that the distributed algorithm with a fixed $k$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor.
arXiv Detail & Related papers (2022-02-05T01:59:09Z)
Improved Approximation Algorithms for Individually Fair Clustering [9.914246432182873]
We present an improved $(16p +varepsilon,3)$-bicriteria approximation for the fair $k$-clustering with $ell_p$-norm cost. Our approach suggests a reduction from our individually fair clustering to a clustering with a group fairness requirement proposed by Kleindessner et al.
arXiv Detail & Related papers (2021-06-26T15:22:52Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Individual Fairness for $k$-Clustering [16.988236398003068]
Local search based algorithm for $k$-median and $k$-means. We show how to get a bicriteria approximation for fair $k$-clustering.
arXiv Detail & Related papers (2020-02-17T02:31:13Z)
Structures of Spurious Local Minima in $k$-means [20.155509538529568]
We investigate the structures of spurious local solutions to the $k$-means problem. We prove that this is essentially the only type of spurious local minima under a separation condition.
arXiv Detail & Related papers (2020-02-16T22:15:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.