Privacy for Free: How does Dataset Condensation Help Privacy?
- URL: http://arxiv.org/abs/2206.00240v1
- Date: Wed, 1 Jun 2022 05:39:57 GMT
- Title: Privacy for Free: How does Dataset Condensation Help Privacy?
- Authors: Tian Dong, Bo Zhao and Lingjuan Lyu
- Abstract summary: We identify that dataset condensation (DC) is also a better solution to replace the traditional data generators for private data generation.
We empirically validate the visual privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks.
- Score: 21.418263507735684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To prevent unintentional data leakage, research community has resorted to
data generators that can produce differentially private data for model
training. However, for the sake of the data privacy, existing solutions suffer
from either expensive training cost or poor generalization performance.
Therefore, we raise the question whether training efficiency and privacy can be
achieved simultaneously. In this work, we for the first time identify that
dataset condensation (DC) which is originally designed for improving training
efficiency is also a better solution to replace the traditional data generators
for private data generation, thus providing privacy for free. To demonstrate
the privacy benefit of DC, we build a connection between DC and differential
privacy, and theoretically prove on linear feature extractors (and then
extended to non-linear feature extractors) that the existence of one sample has
limited impact ($O(m/n)$) on the parameter distribution of networks trained on
$m$ samples synthesized from $n (n \gg m)$ raw samples by DC. We also
empirically validate the visual privacy and membership privacy of
DC-synthesized data by launching both the loss-based and the state-of-the-art
likelihood-based membership inference attacks. We envision this work as a
milestone for data-efficient and privacy-preserving machine learning.
Related papers
- FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments.
We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Differentially Private Synthetic Data Generation via
Lipschitz-Regularised Variational Autoencoders [3.7463972693041274]
It is often overlooked that generative models are prone to memorising many details of individual training records.
In this paper we explore an alternative approach for privately generating data that makes direct use of the inherentity in generative models.
arXiv Detail & Related papers (2023-04-22T07:24:56Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Causally Constrained Data Synthesis for Private Data Release [36.80484740314504]
Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data.
Prior works utilize differentially private data release mechanisms to provide formal privacy guarantees.
We propose incorporating causal information into the training process to favorably modify the aforementioned trade-off.
arXiv Detail & Related papers (2021-05-27T13:46:57Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z) - The Cost of Privacy in Asynchronous Differentially-Private Machine
Learning [17.707240607542236]
We develop differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets.
A central learner interacts with the private data owners one-on-one whenever they are available for communication.
We prove that we can forecast the performance of the proposed privacy-preserving asynchronous algorithms.
arXiv Detail & Related papers (2020-03-18T23:06:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.