Related papers: DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

URL: http://arxiv.org/abs/2307.10430v1
Date: Wed, 19 Jul 2023 19:40:21 GMT
Title: DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation
Authors: Rodrigo Castellon, Achintya Gopal, Brian Bloniarz, David Rosenberg
Abstract summary: We present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy. We provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most.
Score: 1.4418363806859886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generation of synthetic tabular data that preserves differential privacy is a problem of growing importance. While traditional marginal-based methods have achieved impressive results, recent work has shown that deep learning-based approaches tend to lag behind. In this work, we present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy and achieves performance competitive with marginal-based methods on a wide variety of datasets, capable of even outperforming state-of-the-art methods in certain settings. We also provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most. These results suggest that deep learning-based techniques should be considered as a viable alternative to marginal-based methods in the generation of differentially private synthetic tabular data.

Related papers

Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications. Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility. We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings [23.687702204151872]
We introduce novel methods for adapting diffusion models under differential privacy constraints, enabling privacy-preserving style and content transfer without fine-tuning. We apply these methods to Stable Diffusion for style adaptation using two private datasets: a collection of artworks by a single artist and pictograms from the Paris 2024 Olympics. Experimental results show that the TI-based adaptation achieves superior fidelity in style transfer, even under strong privacy guarantees.
arXiv Detail & Related papers (2024-11-22T00:09:49Z)
Novel Saliency Analysis for the Forward Forward Algorithm [0.0]
We introduce the Forward Forward algorithm into neural network training. This method involves executing two forward passes the first with actual data to promote positive reinforcement, and the second with synthetically generated negative data to enable discriminative learning. To overcome the limitations inherent in traditional saliency techniques, we developed a bespoke saliency algorithm specifically tailored for the Forward Forward framework.
arXiv Detail & Related papers (2024-09-18T17:21:59Z)
Artificial Inductive Bias for Synthetic Tabular Data Generation in Data-Scarce Scenarios [8.062368743143388]
We propose a novel methodology for generating realistic and reliable synthetic data with Deep Generative Models (DGMs) in limited real-data environments. Our approach proposes several ways to generate an artificial inductive bias in a DGM through transfer learning and meta-learning techniques. We validate our approach using two state-of-the-art DGMs, namely, a Variational Autoencoder and a Generative Adversarial Network, to show that our artificial inductive bias fuels superior synthetic data quality.
arXiv Detail & Related papers (2024-07-03T12:53:42Z)
Privacy-preserving data release leveraging optimal transport and particle gradient descent [10.499611180329804]
We introduce PrivPGD, a new generation method for private data synthesis, leveraging tools from optimal transport and particle gradient descent. Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints.
arXiv Detail & Related papers (2024-01-31T13:28:07Z)
Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World [64.4289385463226]
We propose Segue: Side-information guided generative unlearnable examples. To improve transferability, we introduce side information such as true labels and pseudo labels. It can resist JPEG compression, adversarial training, and some standard data augmentations.
arXiv Detail & Related papers (2023-10-24T06:22:37Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
A Novel Cross-Perturbation for Single Domain Generalization [54.612933105967606]
Single domain generalization aims to enhance the ability of the model to generalize to unknown domains when trained on a single source domain. The limited diversity in the training data hampers the learning of domain-invariant features, resulting in compromised generalization performance. We propose CPerb, a simple yet effective cross-perturbation method to enhance the diversity of the training data.
arXiv Detail & Related papers (2023-08-02T03:16:12Z)
On the utility and protection of optimization with differential privacy and classic regularization techniques [9.413131350284083]
We study the effectiveness of the differentially-private descent (DP-SGD) algorithm against standard optimization practices with regularization techniques. We discuss differential privacy's flaws and limits and empirically demonstrate the often superior privacy-preserving properties of dropout and l2-regularization.
arXiv Detail & Related papers (2022-09-07T14:10:21Z)
Model-Based Deep Learning: On the Intersection of Deep Learning and Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications. Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular. Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z)
Non-IID data and Continual Learning processes in Federated Learning: A long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it. At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z)
Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
PEARL: Data Synthesis via Private Embeddings and Adversarial Reconstruction Learning [1.8692254863855962]
We propose a new framework of data using deep generative models in a differentially private manner. Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion. Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.
arXiv Detail & Related papers (2021-06-08T18:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.