Privacy-preserving data release leveraging optimal transport and particle gradient descent
- URL: http://arxiv.org/abs/2401.17823v3
- Date: Mon, 29 Jul 2024 14:12:50 GMT
- Title: Privacy-preserving data release leveraging optimal transport and particle gradient descent
- Authors: Konstantin Donhauser, Javier Abad, Neha Hulkund, Fanny Yang,
- Abstract summary: We introduce PrivPGD, a new generation method for private data synthesis, leveraging tools from optimal transport and particle gradient descent.
Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints.
- Score: 10.499611180329804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal-based approaches, where a dataset is generated from private estimates of the marginals. In this paper, we introduce PrivPGD, a new generation method for marginal-based private data synthesis, leveraging tools from optimal transport and particle gradient descent. Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints.
Related papers
- Leveraging Vertical Public-Private Split for Improved Synthetic Data Generation [9.819636361032256]
Differentially Private Synthetic Data Generation is a key enabler of private and secure data sharing.
Recent literature has explored scenarios where a small amount of public data is used to help enhance the quality of synthetic data.
We propose a novel framework that adapts horizontal public-assisted methods into the vertical setting.
arXiv Detail & Related papers (2025-04-15T08:59:03Z) - Leveraging Robust Optimization for LLM Alignment under Distribution Shifts [54.654823811482665]
Large language models (LLMs) increasingly rely on preference alignment methods to steer outputs toward human values.
Recent approaches have turned to synthetic data generated by LLMs as a scalable alternative.
We propose a novel distribution-aware optimization framework that improves preference alignment in the presence of such shifts.
arXiv Detail & Related papers (2025-04-08T09:14:38Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.
Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.
We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences.
Our key idea is leveraging the human prior knowledge within the small (seed) data.
We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z) - Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization [2.28438857884398]
We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy.
Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy.
The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
arXiv Detail & Related papers (2024-04-24T22:58:42Z) - Differentially Private Sliced Inverse Regression: Minimax Optimality and
Algorithm [16.14032140601778]
We propose optimally differentially private algorithms designed to address privacy concerns in the context of sufficient dimension reduction.
We develop differentially private algorithms that achieve the minimax lower bounds up to logarithmic factors.
As a natural extension, we can readily offer analogous lower and upper bounds for differentially private sparse principal component analysis.
arXiv Detail & Related papers (2024-01-16T06:47:43Z) - Partition-based differentially private synthetic data generation [0.5095097384893414]
We present a partition-based approach that reduces errors and improves the quality of synthetic data, even with a limited privacy budget.
The synthetic data produced using our approach exhibits improved quality and utility, making it a preferable choice for private synthetic data sharing.
arXiv Detail & Related papers (2023-10-10T07:23:37Z) - DPGOMI: Differentially Private Data Publishing with Gaussian Optimized
Model Inversion [8.204115285718437]
We propose Differentially Private Data Publishing with Gaussian Optimized Model Inversion (DPGOMI) to address this issue.
Our approach involves mapping private data to the latent space using a public generator, followed by a lower-dimensional DP-GAN with better convergence properties.
Our results show that DPGOMI outperforms the standard DP-GAN method in terms of Inception Score, Freche't Inception Distance, and classification performance.
arXiv Detail & Related papers (2023-10-06T18:46:22Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - DP-TBART: A Transformer-based Autoregressive Model for Differentially
Private Tabular Data Generation [1.4418363806859886]
We present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy.
We provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most.
arXiv Detail & Related papers (2023-07-19T19:40:21Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.