Related papers: Private Geometric Median in Nearly-Linear Time

Private Geometric Median in Nearly-Linear Time

URL: http://arxiv.org/abs/2505.20189v1
Date: Mon, 26 May 2025 16:32:49 GMT
Title: Private Geometric Median in Nearly-Linear Time
Authors: Syamantak Kumar, Daogao Liu, Kevin Tian, Chutong Yang,
Abstract summary: Estimating the geometric median of a dataset is a fundamental problem in computational geometry.<n> [HSU24] gave an $alpha$-multiplicative approximation to the geometric median objective.<n>We give an improved algorithm that obtains the same approximation quality.
Score: 11.537965585323304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating the geometric median of a dataset is a robust counterpart to mean estimation, and is a fundamental problem in computational geometry. Recently, [HSU24] gave an $(\varepsilon, \delta)$-differentially private algorithm obtaining an $\alpha$-multiplicative approximation to the geometric median objective, $\frac 1 n \sum_{i \in [n]} \|\cdot - \mathbf{x}_i\|$, given a dataset $\mathcal{D} := \{\mathbf{x}_i\}_{i \in [n]} \subset \mathbb{R}^d$. Their algorithm requires $n \gtrsim \sqrt d \cdot \frac 1 {\alpha\varepsilon}$ samples, which they prove is information-theoretically optimal. This result is surprising because its error scales with the \emph{effective radius} of $\mathcal{D}$ (i.e., of a ball capturing most points), rather than the worst-case radius. We give an improved algorithm that obtains the same approximation quality, also using $n \gtrsim \sqrt d \cdot \frac 1 {\alpha\epsilon}$ samples, but in time $\widetilde{O}(nd + \frac d {\alpha^2})$. Our runtime is nearly-linear, plus the cost of the cheapest non-private first-order method due to [CLM+16]. To achieve our results, we use subsampling and geometric aggregation tools inspired by FriendlyCore [TCK+22] to speed up the "warm start" component of the [HSU24] algorithm, combined with a careful custom analysis of DP-SGD's sensitivity for the geometric median objective.

Related papers

Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior [28.71736321665378]
We study the nonparametric maximum likelihood estimator $widehatpi$ for Gaussian location mixtures in one dimension.<n>We provide an algorithm which for small enough $varepsilon>0$ computes an $varepsilon$-approximation of $widehatpi$ in Wasserstein distance in time.<n>We also show the distribution of $widehatpi$ conditioned to be $k$-atomic admits a density on the associated $2k-1$ dimensional parameter space.
arXiv Detail & Related papers (2025-03-26T03:36:36Z)
Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality. We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Metric Embeddings Beyond Bi-Lipschitz Distortion via Sherali-Adams [34.7582575446942]
We give the first approximation algorithm for MDS with quasi-polynomial dependency on $Delta.<n>Our algorithms are based on a novel geometry-aware analysis of a conditional rounding of the Sherali-Adams LP.
arXiv Detail & Related papers (2023-11-29T17:42:05Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection. This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z)
A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Private High-Dimensional Hypothesis Testing [4.133655523622441]
We provide improved differentially private algorithms for identity testing of high-dimensional distributions. Specifically, we can test whether the distribution comes from $mathcalN(mu*, Sigma)$ for some fixed $mu*$ or from some $mathcalN(mu*, Sigma)$ with total variation distance at least $alpha$.
arXiv Detail & Related papers (2022-03-03T06:25:48Z)
Random Graph Matching in Geometric Models: the Case of Complete Graphs [21.689343447798677]
This paper studies the problem of matching two complete graphs with edge weights correlated through latent geometries. We derive an approximate maximum likelihood estimator, which provably achieves, with high probability, perfect recovery of $pi*$. As a side discovery, we show that the celebrated spectral algorithm of [Ume88] emerges as a further approximation to the maximum likelihood in the geometric model.
arXiv Detail & Related papers (2022-02-22T04:14:45Z)
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling. Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z)
Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples. We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.