Private Evolution Converges
- URL: http://arxiv.org/abs/2506.08312v1
- Date: Tue, 10 Jun 2025 00:47:09 GMT
- Title: Private Evolution Converges
- Authors: Tomás González, Giulia Fanti, Aaditya Ramdas,
- Abstract summary: Private Evolution (PE) is a training-free method for differentially private (DP) synthetic data generation.<n>We develop a new theoretical framework to explain PE's practical behavior and identify sufficient conditions for its convergence.<n>We show that PE produces an $(epsilon, delta)$-DP synthetic dataset with expected 1-Wasserstein distance of order $tildeO(d(nepsilon)-1/d)$ from the original.
- Score: 35.2826599294184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm's behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to explain PE's practical behavior and identify sufficient conditions for its convergence. For $d$-dimensional sensitive datasets with $n$ data points from a bounded domain, we prove that PE produces an $(\epsilon, \delta)$-DP synthetic dataset with expected 1-Wasserstein distance of order $\tilde{O}(d(n\epsilon)^{-1/d})$ from the original, establishing worst-case convergence of the algorithm as $n \to \infty$. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in simulations.
Related papers
- Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time [3.1789549088190414]
We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively collaborate the sum of their local cost functions via peer-to-peer communication.<n>We propose a novel, Spanning Tree Push-Pull (STPP), which employs two trees extracted from a general communication graph to distribute both model parameters and parameters.
arXiv Detail & Related papers (2025-03-20T13:11:44Z) - Efficiently Computing Similarities to Private Datasets [19.99000806126529]
Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data.
We study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X subset mathbbRd$, output a differentially private (DP) data structure which approximates $sum_x in X f(x,y)$ for any query $y$.
arXiv Detail & Related papers (2024-03-13T19:19:19Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Differentially Private Topological Data Analysis [6.983833229285599]
This paper is the first to attempt differentially private (DP) topological data analysis (TDA)
We show that the commonly used vCech complex has sensitivity that does not decrease as the sample size $n$ increases.
We propose using the exponential mechanism whose utility function is defined in terms of the bottleneck distance of the $L1$-DTM persistence diagrams.
arXiv Detail & Related papers (2023-05-05T15:15:04Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes.
We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm.
textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z) - The Fundamental Price of Secure Aggregation in Differentially Private
Federated Learning [34.630300910399036]
We characterize the fundamental communication cost required to obtain the best accuracy under $varepsilon$ central DP.
Our results show that $tildeOleft( min(n2varepsilon2, d) right)$ bits per client are both sufficient and necessary.
This provides a significant improvement relative to state-of-the-art SecAgg distributed DP schemes.
arXiv Detail & Related papers (2022-03-07T22:56:09Z) - Faster Convergence of Local SGD for Over-Parameterized Models [1.5504102675587357]
Modern machine learning architectures are often highly expressive.
We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized functions in heterogeneous data setting.
For general convex loss functions, we establish an error bound $O(K/T)$ otherwise.
For non-loss functions, we prove an error bound $O(K/T)$ in both cases.
We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize.
arXiv Detail & Related papers (2022-01-30T04:05:56Z) - Outlier-Robust Optimal Transport: Duality, Structure, and Statistical
Applications [25.410110072480187]
Wasserstein distances are sensitive to outliers in the considered distributions.
We propose a new outlier-robust Wasserstein distance $mathsfW_pvarepsilon$ which allows for $varepsilon$ outlier mass to be removed from each contaminated distribution.
arXiv Detail & Related papers (2021-11-02T04:05:45Z) - Brain Image Synthesis with Unsupervised Multivariate Canonical
CSC$\ell_4$Net [122.8907826672382]
We propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$ell_4$Net.
arXiv Detail & Related papers (2021-03-22T05:19:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.