An Algorithm for Streaming Differentially Private Data
- URL: http://arxiv.org/abs/2401.14577v2
- Date: Wed, 31 Jan 2024 01:53:05 GMT
- Title: An Algorithm for Streaming Differentially Private Data
- Authors: Girish Kumar, Thomas Strohmer, and Roman Vershynin
- Abstract summary: We derive an algorithm for differentially private synthetic streaming data generation, especially curated towards spatial datasets.
The utility of our algorithm is verified on both real-world and simulated datasets.
- Score: 7.726042106665366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much of the research in differential privacy has focused on offline
applications with the assumption that all data is available at once. When these
algorithms are applied in practice to streams where data is collected over
time, this either violates the privacy guarantees or results in poor utility.
We derive an algorithm for differentially private synthetic streaming data
generation, especially curated towards spatial datasets. Furthermore, we
provide a general framework for online selective counting among a collection of
queries which forms a basis for many tasks such as query answering and
synthetic data generation. The utility of our algorithm is verified on both
real-world and simulated datasets.
Related papers
- Differentially Private Synthetic High-dimensional Tabular Stream [7.726042106665366]
We propose an algorithmic framework for streaming data that generates multiple synthetic datasets over time.
Our algorithm satisfies differential privacy for the entire input stream.
We show the utility of our method via experiments on real-world datasets.
arXiv Detail & Related papers (2024-08-31T01:31:59Z) - Differentially Private Synthetic Data with Private Density Estimation [2.209921757303168]
We adopt the framework of differential privacy, and explore mechanisms for generating an entire dataset.
We build upon the work of Boedihardjo et al, which laid the foundations for a new optimization-based algorithm for generating private synthetic data.
arXiv Detail & Related papers (2024-05-06T14:06:12Z) - A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment [76.04306818209753]
We introduce a substantial crowdsourcing annotation dataset collected from a real-world crowdsourcing platform.
This dataset comprises approximately two thousand workers, one million tasks, and six million annotations.
We evaluate the effectiveness of several representative truth inference algorithms on this dataset.
arXiv Detail & Related papers (2024-03-10T16:00:41Z) - Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Differentially Private Synthetic Data Using KD-Trees [11.96971298978997]
We exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms.
We propose both data independent and data dependent algorithms for $epsilon$-differentially private synthetic data generation.
We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.
arXiv Detail & Related papers (2023-06-19T17:08:32Z) - Differentially Private Algorithms for Synthetic Power System Datasets [0.0]
Power systems research relies on the availability of real-world network datasets.
Data owners are hesitant to share data due to security and privacy risks.
We develop privacy-preserving algorithms for the synthetic generation of optimization and machine learning datasets.
arXiv Detail & Related papers (2023-03-20T13:38:58Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem.
We show that our selection methods outperform random selection by $5-20%$.
In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z) - Differentially Private Query Release Through Adaptive Projection [19.449593001368193]
We propose, implement, and evaluate a new algorithm for releasing answers to very large numbers of statistical queries like $k$-way marginals.
Our algorithm makes adaptive use of a continuous relaxation of the Projection Mechanism, which answers queries on the private dataset using simple perturbation.
We find that our method outperforms existing algorithms in many cases, especially when the privacy budget is small or the query class is large.
arXiv Detail & Related papers (2021-03-11T12:43:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.