Related papers: Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers

Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers

URL: http://arxiv.org/abs/2510.24621v1
Date: Tue, 28 Oct 2025 16:49:03 GMT
Title: Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers
Authors: Ziyi Fang, Lingxiao Huang, Runkai Yang,
Abstract summary: We construct a coreset of size $tildeO(varepsilon-2 cdot minvarepsilon-2, d)$ when $n geq 4m$.<n>For the special case of $d = 1$, we achieve an optimal coreset size of $tildeTheta(varepsilon-1/2 + fracmn varepsilon-1)$.<n>Our results extend to robust $(k,z)$-clustering
Score: 9.19267739465056
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $\varepsilon$. Given an outlier count $m$, we construct a coreset of size $\tilde{O}(\varepsilon^{-2} \cdot \min\{\varepsilon^{-2}, d\})$ when $n \geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022 & 2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $\tilde{\Theta}(\varepsilon^{-1/2} + \frac{m}{n} \varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them.Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.

Related papers

Computational Hardness of Private Coreset [84.99100741615423]
For a given input set of points, a coreset is another set of points such that the $k$-means objective for any candidate solution is preserved up to a multiplicative $(, 1/n(1))$ factor (and some additive factor)<n>We show that no-time $(, 1/n(1))$-DP algorithm can compute a coreset for $k$-means in the $ell_infty$-metric for some constant $> 0$ (and some constant additive factor)<n>For $k$-means in the
arXiv Detail & Related papers (2026-02-19T15:58:49Z)
Linear time small coresets for k-mean clustering of segments with applications [4.759823735082844]
We study the $k$-means problem for a set $mathcalS subseteq mathbbRd$ of $n$ segments.<n>For any $varepsilon > 0$, an $varepsilon$-coreset is a weighted subset $C subseteq mathbbRd$ that approximates $D(mathcalS,X)$ within a factor of $1 pm varepsilon$ for any set of $k$ centers.
arXiv Detail & Related papers (2025-11-16T11:48:55Z)
Sublinear Time Quantum Sensitivity Sampling [57.356528942341534]
We present a unified framework for quantum sensitivity sampling, extending the advantages of quantum computing to a broad class of classical approximation problems.<n>Our framework provides a streamlined approach for constructing coresets and offers significant runtime improvements in applications such as clustering, regression, and low-rank approximation.
arXiv Detail & Related papers (2025-09-20T20:18:49Z)
Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation.<n>We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z)
Ridge Leverage Score Sampling for $\ell_p$ Subspace Approximation [47.790126028106734]
A popular approach to cope with the NP-hardness is to compute a strong coreset.<n>We obtain an algorithm for constructing a strong coreset for $ell_p$ subspace approximation of size $tilde O(kepsilon-4/p)$ for $p2$ and $tilde O(kp/2epsilon-p)$ for $p>2$.
arXiv Detail & Related papers (2024-07-03T16:49:28Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z)
Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces [2.687607197645453]
We show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms. We also show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
arXiv Detail & Related papers (2023-05-12T08:43:28Z)
New Coresets for Projective Clustering and Applications [34.82221047030618]
Given a set of points $P$ in $mathbbRd$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. We show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_infty$, and Fair regression.
arXiv Detail & Related papers (2022-03-08T19:50:27Z)
Towards Optimal Lower Bounds for k-median and k-means Coresets [25.713987341159918]
Given a set of points in a metric space, the $(k,z)$-clustering problem consists of finding a set of $k$ points called centers. We show that any coreset for $(k,z)$ clustering must consist of at least $Omega(k varepsilon-2 log n)$ and $Omega(k varepsilon-2 D)$ points.
arXiv Detail & Related papers (2022-02-25T16:13:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.