Generating Private Synthetic Data with Genetic Algorithms
- URL: http://arxiv.org/abs/2306.03257v1
- Date: Mon, 5 Jun 2023 21:19:37 GMT
- Title: Generating Private Synthetic Data with Genetic Algorithms
- Authors: Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu
- Abstract summary: We study the problem of efficiently generating differentially private synthetic data that approximates the statistical properties of an underlying sensitive dataset.
We propose Private-GSD, a private genetic algorithm based on zeroth-order optimizations that do not require modifying the original objective.
We show that Private-GSD outperforms the state-of-the-art methods on non-differential queries while matching accuracy in approximating differentiable ones.
- Score: 29.756119782419955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of efficiently generating differentially private
synthetic data that approximate the statistical properties of an underlying
sensitive dataset. In recent years, there has been a growing line of work that
approaches this problem using first-order optimization techniques. However,
such techniques are restricted to optimizing differentiable objectives only,
severely limiting the types of analyses that can be conducted. For example,
first-order mechanisms have been primarily successful in approximating
statistical queries only in the form of marginals for discrete data domains. In
some cases, one can circumvent such issues by relaxing the task's objective to
maintain differentiability. However, even when possible, these approaches
impose a fundamental limitation in which modifications to the minimization
problem become additional sources of error. Therefore, we propose Private-GSD,
a private genetic algorithm based on zeroth-order optimization heuristics that
do not require modifying the original objective. As a result, it avoids the
aforementioned limitations of first-order optimization. We empirically evaluate
Private-GSD against baseline algorithms on data derived from the American
Community Survey across a variety of statistics--otherwise known as statistical
queries--both for discrete and real-valued attributes. We show that Private-GSD
outperforms the state-of-the-art methods on non-differential queries while
matching accuracy in approximating differentiable ones.
Related papers
- Instance-Specific Asymmetric Sensitivity in Differential Privacy [2.855485723554975]
We build upon previous work that gives a paradigm for selecting an output through the exponential mechanism.
Our framework will slightly modify the closeness metric and instead give a simple and efficient application of the sparse vector technique.
arXiv Detail & Related papers (2023-11-02T05:01:45Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Differentially Private Domain Adaptation with Theoretical Guarantees [46.37771025567305]
In many applications, the labeled data at the labeled data's disposal is subject to privacy constraints and is relatively limited.
This is the modern problem of supervised domain adaptation from a public source to a private target domain.
We make use of a general learner to benefit from favorable theoretical learning guarantees.
arXiv Detail & Related papers (2023-06-15T04:03:06Z) - Differential Privacy via Distributionally Robust Optimization [8.409434654561789]
We develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees.
Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds.
Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.
arXiv Detail & Related papers (2023-04-25T09:31:47Z) - Private Domain Adaptation from a Public Source [48.83724068578305]
We design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data.
Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms.
arXiv Detail & Related papers (2022-08-12T06:52:55Z) - Optimal Algorithms for Mean Estimation under Local Differential Privacy [55.32262879188817]
We show that PrivUnit achieves the optimal variance among a large family of locally private randomizers.
We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees.
arXiv Detail & Related papers (2022-05-05T06:43:46Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Differentially Private Query Release Through Adaptive Projection [19.449593001368193]
We propose, implement, and evaluate a new algorithm for releasing answers to very large numbers of statistical queries like $k$-way marginals.
Our algorithm makes adaptive use of a continuous relaxation of the Projection Mechanism, which answers queries on the private dataset using simple perturbation.
We find that our method outperforms existing algorithms in many cases, especially when the privacy budget is small or the query class is large.
arXiv Detail & Related papers (2021-03-11T12:43:18Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - A Discriminative Technique for Multiple-Source Adaptation [55.5865665284915]
We present a new discriminative technique for the multiple-source adaptation, MSA, problem.
Our solution only requires conditional probabilities that can easily be accurately estimated from unlabeled data from the source domains.
Our experiments with real-world applications further demonstrate that our new discriminative MSA algorithm outperforms the previous generative solution.
arXiv Detail & Related papers (2020-08-25T14:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.