Parameterized Approximation Schemes for Clustering with General Norm
Objectives
- URL: http://arxiv.org/abs/2304.03146v1
- Date: Thu, 6 Apr 2023 15:31:37 GMT
- Title: Parameterized Approximation Schemes for Clustering with General Norm
Objectives
- Authors: Fateme Abbasi and Sandip Banerjee and Jaros{\l}aw Byrka and Parinya
Chalermsook and Ameet Gadekar and Kamyar Khodamoradi and D\'aniel Marx and
Roohani Sharma and Joachim Spoerhase
- Abstract summary: This paper considers the well-studied regime of designing a $(k,epsilon)$-approximation algorithm for a $k$-clustering problem.
Our main contribution is a clean and simple EPAS that settles more than ten clustering problems.
- Score: 0.6956677722498498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the well-studied algorithmic regime of designing a
$(1+\epsilon)$-approximation algorithm for a $k$-clustering problem that runs
in time $f(k,\epsilon)poly(n)$ (sometimes called an efficient parameterized
approximation scheme or EPAS for short). Notable results of this kind include
EPASes in the high-dimensional Euclidean setting for $k$-center [Bad\u{o}iu,
Har-Peled, Indyk; STOC'02] as well as $k$-median, and $k$-means [Kumar,
Sabharwal, Sen; J. ACM 2010]. However, existing EPASes handle only basic
objectives (such as $k$-center, $k$-median, and $k$-means) and are tailored to
the specific objective and metric space.
Our main contribution is a clean and simple EPAS that settles more than ten
clustering problems (across multiple well-studied objectives as well as metric
spaces) and unifies well-known EPASes. Our algorithm gives EPASes for a large
variety of clustering objectives (for example, $k$-means, $k$-center,
$k$-median, priority $k$-center, $\ell$-centrum, ordered $k$-median, socially
fair $k$-median aka robust $k$-median, or more generally monotone norm
$k$-clustering) and metric spaces (for example, continuous high-dimensional
Euclidean spaces, metrics of bounded doubling dimension, bounded treewidth
metrics, and planar metrics).
Key to our approach is a new concept that we call bounded $\epsilon$-scatter
dimension--an intrinsic complexity measure of a metric space that is a
relaxation of the standard notion of bounded doubling dimension. Our main
technical result shows that two conditions are essentially sufficient for our
algorithm to yield an EPAS on the input metric $M$ for any clustering
objective: (i) The objective is described by a monotone (not necessarily
symmetric!) norm, and (ii) the $\epsilon$-scatter dimension of $M$ is upper
bounded by a function of $\epsilon$.
Related papers
- Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems [8.74967598360817]
Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group.
We propose a novel Relax and Merge'' framework, where $rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm.
If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(epsilon))$ with only a slight violation of the fairness constraints.
arXiv Detail & Related papers (2024-11-02T02:50:12Z) - Clustering to Minimize Cluster-Aware Norm Objectives [0.3481985817302898]
We seek to partition a given set $P$ of data points into $k$ clusters by finding a set $X$ of $k$ centers.
The cost of a cluster, represented by a center $xin X$, is a monotone, symmetric norm $f$ (inner norm) of the vector of distances of points assigned to $x$.
The goal is to minimize a norm $g$ (outer norm) of the vector of cluster costs.
arXiv Detail & Related papers (2024-10-31T16:33:40Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Diversity-aware clustering: Computational Complexity and Approximation
Algorithms [19.67390261007849]
We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups.
A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group.
We present parameterized approximation algorithms with approximation ratios $1+ frac2e$, $1+frac8e$ and $3$.
arXiv Detail & Related papers (2024-01-10T19:01:05Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces [2.687607197645453]
We show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
We also show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
arXiv Detail & Related papers (2023-05-12T08:43:28Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Distributed k-Means with Outliers in General Metrics [0.6117371161379208]
We present a distributed coreset-based 3-round approximation algorithm for k-means with $z$ outliers for general metric spaces.
An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by the dimension doubling $D$ of the metric space.
arXiv Detail & Related papers (2022-02-16T16:24:31Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.