Uncertainty-Autoencoder-Based Privacy and Utility Preserving Data Type
Conscious Transformation
- URL: http://arxiv.org/abs/2205.01950v1
- Date: Wed, 4 May 2022 08:40:15 GMT
- Title: Uncertainty-Autoencoder-Based Privacy and Utility Preserving Data Type
Conscious Transformation
- Authors: Bishwas Mandal, George Amariucai, Shuangqing Wei
- Abstract summary: We propose an adversarial learning framework that deals with the privacy-utility tradeoff problem under two conditions.
Under data-type ignorant conditions, the privacy mechanism provides a one-hot encoding of categorical features, representing exactly one class.
Under data-type aware conditions, the categorical variables are represented by a collection of scores, one for each class.
- Score: 3.7315964084413173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an adversarial learning framework that deals with the
privacy-utility tradeoff problem under two types of conditions: data-type
ignorant, and data-type aware. Under data-type aware conditions, the privacy
mechanism provides a one-hot encoding of categorical features, representing
exactly one class, while under data-type ignorant conditions the categorical
variables are represented by a collection of scores, one for each class. We use
a neural network architecture consisting of a generator and a discriminator,
where the generator consists of an encoder-decoder pair, and the discriminator
consists of an adversary and a utility provider. Unlike previous research
considering this kind of architecture, which leverages autoencoders (AEs)
without introducing any randomness, or variational autoencoders (VAEs) based on
learning latent representations which are then forced into a Gaussian
assumption, our proposed technique introduces randomness and removes the
Gaussian assumption restriction on the latent variables, only focusing on the
end-to-end stochastic mapping of the input to privatized data. We test our
framework on different datasets: MNIST, FashionMNIST, UCI Adult, and US Census
Demographic Data, providing a wide range of possible private and utility
attributes. We use multiple adversaries simultaneously to test our privacy
mechanism -- some trained from the ground truth data and some trained from the
perturbed data generated by our privacy mechanism. Through comparative
analysis, our results demonstrate better privacy and utility guarantees than
the existing works under similar, data-type ignorant conditions, even when the
latter are considered under their original restrictive single-adversary model.
Related papers
- Private prediction for large-scale synthetic text generation [28.488459921169905]
We present an approach for generating differentially private synthetic text using large language models (LLMs)
In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees.
arXiv Detail & Related papers (2024-07-16T18:28:40Z) - Footprints of Data in a Classifier Model: The Privacy Issues and Their Mitigation through Data Obfuscation [0.9208007322096533]
embedding of footprints of training data in a prediction model is one such facet.
difference in performance quality in test and training data causes passive identification of data that have trained the model.
This research focuses on addressing the vulnerability arising from the data footprints.
arXiv Detail & Related papers (2024-07-02T13:56:37Z) - Independent Distribution Regularization for Private Graph Embedding [55.24441467292359]
Graph embeddings are susceptible to attribute inference attacks, which allow attackers to infer private node attributes from the learned graph embeddings.
To address these concerns, privacy-preserving graph embedding methods have emerged.
We propose a novel approach called Private Variational Graph AutoEncoders (PVGAE) with the aid of independent distribution penalty as a regularization term.
arXiv Detail & Related papers (2023-08-16T13:32:43Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving
Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning.
We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Multi-class Classifier based Failure Prediction with Artificial and
Anonymous Training for Data Privacy [0.0]
A neural network based multi-class classifier is developed for failure prediction.
The proposed mechanism completely decouples the data set used for training process from the actual data which is kept private.
Results show high accuracy in failure prediction under different parameter configurations.
arXiv Detail & Related papers (2022-09-06T07:53:33Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - On Deep Learning with Label Differential Privacy [54.45348348861426]
We study the multi-class classification setting where the labels are considered sensitive and ought to be protected.
We propose a new algorithm for training deep neural networks with label differential privacy, and run evaluations on several datasets.
arXiv Detail & Related papers (2021-02-11T15:09:06Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.