DP-TBART: A Transformer-based Autoregressive Model for Differentially
Private Tabular Data Generation
- URL: http://arxiv.org/abs/2307.10430v1
- Date: Wed, 19 Jul 2023 19:40:21 GMT
- Title: DP-TBART: A Transformer-based Autoregressive Model for Differentially
Private Tabular Data Generation
- Authors: Rodrigo Castellon, Achintya Gopal, Brian Bloniarz, David Rosenberg
- Abstract summary: We present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy.
We provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most.
- Score: 1.4418363806859886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generation of synthetic tabular data that preserves differential privacy
is a problem of growing importance. While traditional marginal-based methods
have achieved impressive results, recent work has shown that deep
learning-based approaches tend to lag behind. In this work, we present
Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a
transformer-based autoregressive model that maintains differential privacy and
achieves performance competitive with marginal-based methods on a wide variety
of datasets, capable of even outperforming state-of-the-art methods in certain
settings. We also provide a theoretical framework for understanding the
limitations of marginal-based approaches and where deep learning-based
approaches stand to contribute most. These results suggest that deep
learning-based techniques should be considered as a viable alternative to
marginal-based methods in the generation of differentially private synthetic
tabular data.
Related papers
- Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings [23.687702204151872]
We introduce novel methods for adapting diffusion models under differential privacy constraints, enabling privacy-preserving style and content transfer without fine-tuning.
We apply these methods to Stable Diffusion for style adaptation using two private datasets: a collection of artworks by a single artist and pictograms from the Paris 2024 Olympics.
Experimental results show that the TI-based adaptation achieves superior fidelity in style transfer, even under strong privacy guarantees.
arXiv Detail & Related papers (2024-11-22T00:09:49Z) - Novel Saliency Analysis for the Forward Forward Algorithm [0.0]
We introduce the Forward Forward algorithm into neural network training.
This method involves executing two forward passes the first with actual data to promote positive reinforcement, and the second with synthetically generated negative data to enable discriminative learning.
To overcome the limitations inherent in traditional saliency techniques, we developed a bespoke saliency algorithm specifically tailored for the Forward Forward framework.
arXiv Detail & Related papers (2024-09-18T17:21:59Z) - Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios [8.062368743143388]
We propose a novel methodology for generating realistic and reliable synthetic data with Deep Generative Models (DGMs) in limited real-data environments.
Our approach proposes several ways to generate an artificial inductive bias in a DGM through transfer learning and meta-learning techniques.
We validate our approach using two state-of-the-art DGMs, namely, a Variational Autoencoder and a Generative Adversarial Network, to show that our artificial inductive bias fuels superior synthetic data quality.
arXiv Detail & Related papers (2024-07-03T12:53:42Z) - Segue: Side-information Guided Generative Unlearnable Examples for
Facial Privacy Protection in Real World [64.4289385463226]
We propose Segue: Side-information guided generative unlearnable examples.
To improve transferability, we introduce side information such as true labels and pseudo labels.
It can resist JPEG compression, adversarial training, and some standard data augmentations.
arXiv Detail & Related papers (2023-10-24T06:22:37Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - A Novel Cross-Perturbation for Single Domain Generalization [54.612933105967606]
Single domain generalization aims to enhance the ability of the model to generalize to unknown domains when trained on a single source domain.
The limited diversity in the training data hampers the learning of domain-invariant features, resulting in compromised generalization performance.
We propose CPerb, a simple yet effective cross-perturbation method to enhance the diversity of the training data.
arXiv Detail & Related papers (2023-08-02T03:16:12Z) - On the utility and protection of optimization with differential privacy
and classic regularization techniques [9.413131350284083]
We study the effectiveness of the differentially-private descent (DP-SGD) algorithm against standard optimization practices with regularization techniques.
We discuss differential privacy's flaws and limits and empirically demonstrate the often superior privacy-preserving properties of dropout and l2-regularization.
arXiv Detail & Related papers (2022-09-07T14:10:21Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - Non-IID data and Continual Learning processes in Federated Learning: A
long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private.
In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it.
At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - PEARL: Data Synthesis via Private Embeddings and Adversarial
Reconstruction Learning [1.8692254863855962]
We propose a new framework of data using deep generative models in a differentially private manner.
Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion.
Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.
arXiv Detail & Related papers (2021-06-08T18:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.