DataLens: Scalable Privacy Preserving Training via Gradient Compression
and Aggregation
- URL: http://arxiv.org/abs/2103.11109v1
- Date: Sat, 20 Mar 2021 06:14:19 GMT
- Title: DataLens: Scalable Privacy Preserving Training via Gradient Compression
and Aggregation
- Authors: Boxin Wang, Fan Wu, Yunhui Long, Luka Rimanic, Ce Zhang, Bo Li
- Abstract summary: We propose a scalable privacy-preserving generative model DATALENS.
We show that, DATALENS significantly outperforms other baseline DP generative models.
We adapt the proposed TOPAGG approach, which is one of the key building blocks in DATALENS, to DP SGD training.
- Score: 15.63770709526671
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent success of deep neural networks (DNNs) hinges on the availability of
large-scale dataset; however, training on such dataset often poses privacy
risks for sensitive training information. In this paper, we aim to explore the
power of generative models and gradient sparsity, and propose a scalable
privacy-preserving generative model DATALENS. Comparing with the standard PATE
privacy-preserving framework which allows teachers to vote on one-dimensional
predictions, voting on the high dimensional gradient vectors is challenging in
terms of privacy preservation. As dimension reduction techniques are required,
we need to navigate a delicate tradeoff space between (1) the improvement of
privacy preservation and (2) the slowdown of SGD convergence. To tackle this,
we take advantage of communication efficient learning and propose a novel noise
compression and aggregation approach TOPAGG by combining top-k compression for
dimension reduction with a corresponding noise injection mechanism. We
theoretically prove that the DATALENS framework guarantees differential privacy
for its generated data, and provide analysis on its convergence. To demonstrate
the practical usage of DATALENS, we conduct extensive experiments on diverse
datasets including MNIST, Fashion-MNIST, and high dimensional CelebA, and we
show that, DATALENS significantly outperforms other baseline DP generative
models. In addition, we adapt the proposed TOPAGG approach, which is one of the
key building blocks in DATALENS, to DP SGD training, and show that it is able
to achieve higher utility than the state-of-the-art DP SGD approach in most
cases.
Related papers
- Privacy-preserving datasets by capturing feature distributions with Conditional VAEs [0.11999555634662634]
Conditional Variational Autoencoders (CVAEs) trained on feature vectors extracted from large pre-trained vision foundation models.
Our method notably outperforms traditional approaches in both medical and natural image domains.
Results underscore the potential of generative models to significantly impact deep learning applications in data-scarce and privacy-sensitive environments.
arXiv Detail & Related papers (2024-08-01T15:26:24Z) - Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data [5.448470199971472]
Deep learning holds immense promise for aiding radiologists in breast cancer detection.
achieving optimal model performance is hampered by limitations in availability and sharing of data.
Traditional deep learning models can inadvertently leak sensitive training information.
This work addresses these challenges exploring quantifying the utility of privacy-preserving deep learning techniques.
arXiv Detail & Related papers (2024-07-17T15:52:45Z) - Differentially Private Fine-Tuning of Diffusion Models [22.454127503937883]
The integration of Differential Privacy with diffusion models (DMs) presents a promising yet challenging frontier.
Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data.
We propose a strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off.
arXiv Detail & Related papers (2024-06-03T14:18:04Z) - PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy [4.586288671392977]
We present a privacy-preserving training framework called PATE-TripleGAN.
It incorporates a classifier to pre-classify unlabeled data to reduce dependence on labeled data.
PATE-TripleGAN can generate a higher quality labeled image dataset while ensuring privacy of the training data.
arXiv Detail & Related papers (2024-04-19T09:22:20Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - Sparsity-Preserving Differentially Private Training of Large Embedding
Models [67.29926605156788]
DP-SGD is a training algorithm that combines differential privacy with gradient descent.
Applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency.
We present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models.
arXiv Detail & Related papers (2023-11-14T17:59:51Z) - DPGOMI: Differentially Private Data Publishing with Gaussian Optimized
Model Inversion [8.204115285718437]
We propose Differentially Private Data Publishing with Gaussian Optimized Model Inversion (DPGOMI) to address this issue.
Our approach involves mapping private data to the latent space using a public generator, followed by a lower-dimensional DP-GAN with better convergence properties.
Our results show that DPGOMI outperforms the standard DP-GAN method in terms of Inception Score, Freche't Inception Distance, and classification performance.
arXiv Detail & Related papers (2023-10-06T18:46:22Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially
Private Generators [74.16405337436213]
We propose Gradient-sanitized Wasserstein Generative Adrial Networks (GS-WGAN)
GS-WGAN allows releasing a sanitized form of sensitive data with rigorous privacy guarantees.
We find our approach consistently outperforms state-of-the-art approaches across multiple metrics.
arXiv Detail & Related papers (2020-06-15T10:01:01Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.