Diversity-aware clustering: Computational Complexity and Approximation
Algorithms
- URL: http://arxiv.org/abs/2401.05502v1
- Date: Wed, 10 Jan 2024 19:01:05 GMT
- Title: Diversity-aware clustering: Computational Complexity and Approximation
Algorithms
- Authors: Suhas Thejaswi, Ameet Gadekar, Bruno Ordozgoiti, Aristides Gionis
- Abstract summary: We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups.
A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group.
We present parameterized approximation algorithms with approximation ratios $1+ frac2e$, $1+frac8e$ and $3$.
- Score: 19.67390261007849
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study diversity-aware clustering problems where the data
points are associated with multiple attributes resulting in intersecting
groups. A clustering solution need to ensure that a minimum number of cluster
centers are chosen from each group while simultaneously minimizing the
clustering objective, which can be either $k$-median, $k$-means or
$k$-supplier. We present parameterized approximation algorithms with
approximation ratios $1+ \frac{2}{e}$, $1+\frac{8}{e}$ and $3$ for
diversity-aware $k$-median, diversity-aware $k$-means and diversity-aware
$k$-supplier, respectively. The approximation ratios are tight assuming Gap-ETH
and FPT $\neq$ W[2]. For fair $k$-median and fair $k$-means with disjoint
faicility groups, we present parameterized approximation algorithm with
approximation ratios $1+\frac{2}{e}$ and $1+\frac{8}{e}$, respectively. For
fair $k$-supplier with disjoint facility groups, we present a polynomial-time
approximation algorithm with factor $3$, improving the previous best known
approximation ratio of factor $5$.
Related papers
- Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems [8.74967598360817]
Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group.
We propose a novel Relax and Merge'' framework, where $rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm.
If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(epsilon))$ with only a slight violation of the fairness constraints.
arXiv Detail & Related papers (2024-11-02T02:50:12Z) - Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights [16.120911591795295]
In some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a subset of points, referred as facilities or suppliers.
In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group.
arXiv Detail & Related papers (2024-10-16T18:00:19Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Are Easy Data Easy (for K-Means) [0.0]
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm.
A new algorithm is proposed that is a variation of $k$-means++ via repeated subsampling when choosing a seed.
arXiv Detail & Related papers (2023-08-02T09:40:19Z) - Approximation Algorithms for Fair Range Clustering [14.380145034918158]
This paper studies the fair range clustering problem in which the data points are from different demographic groups.
The goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set.
In particular, the fair range $ell_p$-clustering captures fair range $k$-center, $k$-median and $k$-means as its special cases.
arXiv Detail & Related papers (2023-06-11T21:18:40Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Constant-Factor Approximation Algorithms for Socially Fair
$k$-Clustering [12.545909976356528]
We study approximation algorithms for the socially fair $(ell_p, k)$-clustering problem $m$ groups.
We present (1) a-time $(5+2sqrt6)pimation at most $k+m$ centers (2) a $(5+2sqrt6)pimation with $k$ centers in time $n2O(p)cdot m2$, and (3) a $(15+6sqrt6)p$ approximation with $k$ centers in
arXiv Detail & Related papers (2022-06-22T16:57:17Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.