Minimax optimal differentially private synthetic data for smooth queries
- URL: http://arxiv.org/abs/2602.01607v2
- Date: Thu, 05 Feb 2026 16:22:04 GMT
- Title: Minimax optimal differentially private synthetic data for smooth queries
- Authors: Rundong Ding, Yiyun He, Yizhe Zhu,
- Abstract summary: We study the problem of generating $(varepsilon,)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube.<n>We propose a minimax error rate of $n-min 1, frackd$, up to a $log(n)$ factor. This characterization uncovers a phase transition at $k=d$.
- Score: 6.093338631816647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,δ)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $n^{-\min \{1, \frac{k}{d}\}}$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in (Wang et al., 2016). Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,δ)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).
Related papers
- Nearly Optimal Differentially Private ReLU Regression [18.599299269974498]
We investigate one of the most fundamental non learning problems, ReLU regression, in the Differential Privacy (DP) model.<n>We relax the requirement of $epsilon$ and public data by proposing and analyzing a one-pass mini-batch Generalized Model Perceptron algorithm.
arXiv Detail & Related papers (2025-03-08T02:09:47Z) - Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm.<n>DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms.<n>We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z) - Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions [19.008521454738425]
We study the problem of differentially private convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $ktextth$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound.
We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $Gcdot frac 1 sqrt n + G_k cdot (fracsqrt dnepsilon)1
arXiv Detail & Related papers (2024-06-04T21:26:29Z) - Smooth Sensitivity for Geo-Privacy [17.835910182900985]
Local model of differential privacy (LDP) is the predominant model under which the problem has been studied.
Geo-Privacy (GP) stipulates that the level of distinguishability be proportional to $mathrmdist(x_i, x_i')$.
We generalize the smooth sensitivity framework from Differential Privacy to Geo-Privacy, which allows us to add noise tailored to the hardness of the given instance.
arXiv Detail & Related papers (2024-05-10T08:32:07Z) - Online Differentially Private Synthetic Data Generation [10.177542186664503]
We develop an online algorithm that generates a differentially private synthetic dataset at each time $t$.
This algorithm achieves a near-optimal accuracy bound of $O(log(t)t-1/d)$ for $dgeq 2$ and $O(log4.5(t)t-1)$ for $d=1$ in the 1-Wasserstein distance.
arXiv Detail & Related papers (2024-02-12T19:21:14Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - DP-PCA: Statistically Optimal and Differentially Private PCA [44.22319983246645]
DP-PCA is a single-pass algorithm that overcomes both limitations.
For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=tilde O(d)$.
arXiv Detail & Related papers (2022-05-27T02:02:17Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.