PrivGen: Preserving Privacy of Sequences Through Data Generation
- URL: http://arxiv.org/abs/2002.09834v1
- Date: Sun, 23 Feb 2020 05:43:15 GMT
- Title: PrivGen: Preserving Privacy of Sequences Through Data Generation
- Authors: Sigal Shaked, Lior Rokach
- Abstract summary: Sequential data can serve as a basis for research that will lead to improved processes.
Access and use of such data is usually limited or not permitted at all due to concerns about violating user privacy.
We propose PrivGen, an innovative method for generating data that maintains patterns and characteristics of the source data.
- Score: 14.579475552088688
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential data is everywhere, and it can serve as a basis for research that
will lead to improved processes. For example, road infrastructure can be
improved by identifying bottlenecks in GPS data, or early diagnosis can be
improved by analyzing patterns of disease progression in medical data. The main
obstacle is that access and use of such data is usually limited or not
permitted at all due to concerns about violating user privacy, and rightly so.
Anonymizing sequence data is not a simple task, since a user creates an almost
unique signature over time. Existing anonymization methods reduce the quality
of information in order to maintain the level of anonymity required. Damage to
quality may disrupt patterns that appear in the original data and impair the
preservation of various characteristics. Since in many cases the researcher
does not need the data as is and instead is only interested in the patterns
that exist in the data, we propose PrivGen, an innovative method for generating
data that maintains patterns and characteristics of the source data. We
demonstrate that the data generation mechanism significantly limits the risk of
privacy infringement. Evaluating our method with real-world datasets shows that
its generated data preserves many characteristics of the data, including the
sequential model, as trained based on the source data. This suggests that the
data generated by our method could be used in place of actual data for various
types of analysis, maintaining user privacy and the data's integrity at the
same time.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Tabular Data Synthesis with Differential Privacy: A Survey [24.500349285858597]
Data sharing is a prerequisite for collaborative innovation, enabling organizations to leverage diverse datasets for deeper insights.
Data synthesis tackles this by generating artificial datasets that preserve the statistical characteristics of real data.
Differentially private data synthesis has emerged as a promising approach to privacy-aware data sharing.
arXiv Detail & Related papers (2024-11-04T06:32:48Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.
We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving
Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning.
We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Differentially Private Synthetic Medical Data Generation using
Convolutional GANs [7.2372051099165065]
We develop a differentially private framework for synthetic data generation using R'enyi differential privacy.
Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data.
We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
arXiv Detail & Related papers (2020-12-22T01:03:49Z) - Hide-and-Seek Privacy Challenge [88.49671206936259]
The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to accelerate progress in tackling both problems.
In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset.
arXiv Detail & Related papers (2020-07-23T15:50:59Z) - P3GM: Private High-Dimensional Data Release via Privacy Preserving
Phased Generative Model [23.91327154831855]
This paper proposes privacy-preserving phased generative model (P3GM) for releasing sensitive data.
P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency.
Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity.
arXiv Detail & Related papers (2020-06-22T09:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.