Generating private data with user customization
- URL: http://arxiv.org/abs/2012.01467v1
- Date: Wed, 2 Dec 2020 19:13:58 GMT
- Title: Generating private data with user customization
- Authors: Xiao Chen, Thomas Navidi, Ram Rajagopal
- Abstract summary: Mobile devices can produce and store large amounts of data that can enhance machine learning models.
However, this data may contain private information specific to the data owner that prevents the release of the data.
We want to reduce the correlation between user-specific private information and the data while retaining the useful information.
- Score: 9.415164800448853
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Personal devices such as mobile phones can produce and store large amounts of
data that can enhance machine learning models; however, this data may contain
private information specific to the data owner that prevents the release of the
data. We want to reduce the correlation between user-specific private
information and the data while retaining the useful information. Rather than
training a large model to achieve privatization from end to end, we first
decouple the creation of a latent representation, and then privatize the data
that allows user-specific privatization to occur in a setting with limited
computation and minimal disturbance on the utility of the data. We leverage a
Variational Autoencoder (VAE) to create a compact latent representation of the
data that remains fixed for all devices and all possible private labels. We
then train a small generative filter to perturb the latent representation based
on user specified preferences regarding the private and utility information.
The small filter is trained via a GAN-type robust optimization that can take
place on a distributed device such as a phone or tablet. Under special
conditions of our linear filter, we disclose the connections between our
generative approach and renyi differential privacy. We conduct experiments on
multiple datasets including MNIST, UCI-Adult, and CelebA, and give a thorough
evaluation including visualizing the geometry of the latent embeddings and
estimating the empirical mutual information to show the effectiveness of our
approach.
Related papers
- Segmented Private Data Aggregation in the Multi-message Shuffle Model [6.436165623346879]
We pioneer the study of segmented private data aggregation within the multi-message shuffle model of differential privacy.
Our framework introduces flexible privacy protection for users and enhanced utility for the aggregation server.
Our framework achieves a reduction of about 50% in estimation error compared to existing approaches.
arXiv Detail & Related papers (2024-07-29T01:46:44Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs.
PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Differentially Private Multi-Party Data Release for Linear Regression [40.66319371232736]
Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects.
In this paper we focus on the multi-party setting, where different stakeholders own disjoint sets of attributes belonging to the same group of data subjects.
We propose our novel method and prove it converges to the optimal (non-private) solutions with increasing dataset size.
arXiv Detail & Related papers (2022-06-16T08:32:17Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.