Decouple-and-Sample: Protecting sensitive information in task agnostic
data release
- URL: http://arxiv.org/abs/2203.13204v1
- Date: Thu, 17 Mar 2022 19:15:33 GMT
- Title: Decouple-and-Sample: Protecting sensitive information in task agnostic
data release
- Authors: Abhishek Singh, Ethan Garza, Ayush Chopra, Praneeth Vepakomma, Vivek
Sharma, Ramesh Raskar
- Abstract summary: sanitizer is a framework for secure and task-agnostic data release.
We show that a better privacy-utility trade-off is achieved if sensitive information can be synthesized privately.
- Score: 17.398889291769986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose sanitizer, a framework for secure and task-agnostic data release.
While releasing datasets continues to make a big impact in various applications
of computer vision, its impact is mostly realized when data sharing is not
inhibited by privacy concerns. We alleviate these concerns by sanitizing
datasets in a two-stage process. First, we introduce a global decoupling stage
for decomposing raw data into sensitive and non-sensitive latent
representations. Secondly, we design a local sampling stage to synthetically
generate sensitive information with differential privacy and merge it with
non-sensitive latent features to create a useful representation while
preserving the privacy. This newly formed latent information is a task-agnostic
representation of the original dataset with anonymized sensitive information.
While most algorithms sanitize data in a task-dependent manner, a few
task-agnostic sanitization techniques sanitize data by censoring sensitive
information. In this work, we show that a better privacy-utility trade-off is
achieved if sensitive information can be synthesized privately. We validate the
effectiveness of the sanitizer by outperforming state-of-the-art baselines on
the existing benchmark tasks and demonstrating tasks that are not possible
using existing techniques.
Related papers
- Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective [10.009178591853058]
We propose a formal information-theoretic definition for this utility-preserving privacy protection problem.
We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets.
Results demonstrate the effectiveness and generalizability of our method under various configurations.
arXiv Detail & Related papers (2024-05-23T18:35:46Z) - Disguise without Disruption: Utility-Preserving Face De-Identification [40.484745636190034]
We introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data.
Our method involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility.
We extensively evaluate our method using multiple datasets, demonstrating a higher de-identification rate and superior consistency compared to prior approaches in various downstream tasks.
arXiv Detail & Related papers (2023-03-23T13:50:46Z) - Attribute-preserving Face Dataset Anonymization via Latent Code
Optimization [64.4569739006591]
We present a task-agnostic anonymization procedure that directly optimize the images' latent representation in the latent space of a pre-trained GAN.
We demonstrate through a series of experiments that our method is capable of anonymizing the identity of the images whilst -- crucially -- better-preserving the facial attributes.
arXiv Detail & Related papers (2023-03-20T17:34:05Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Reinforcement Learning on Encrypted Data [58.39270571778521]
We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces.
Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
arXiv Detail & Related papers (2021-09-16T21:59:37Z) - Adversarial representation learning for synthetic replacement of private
attributes [0.7619404259039281]
We propose a novel approach for data privatization, which involves two steps: in the first step, it removes the sensitive information, and in the second step, it replaces this information with an independent random sample.
Our method builds on adversarial representation learning which ensures strong privacy by training the model to fool an increasingly strong adversary.
arXiv Detail & Related papers (2020-06-14T22:07:19Z) - Privacy Adversarial Network: Representation Learning for Mobile Data
Privacy [33.75500773909694]
A growing number of cloud-based intelligent services for mobile users require user data to be sent to the provider.
Prior works either obfuscate the data, e.g. add noise and remove identity information, or send representations extracted from the data, e.g. anonymized features.
This work departs from prior works in methodology: we leverage adversarial learning to a better balance between privacy and utility.
arXiv Detail & Related papers (2020-06-08T09:42:04Z) - PrivGen: Preserving Privacy of Sequences Through Data Generation [14.579475552088688]
Sequential data can serve as a basis for research that will lead to improved processes.
Access and use of such data is usually limited or not permitted at all due to concerns about violating user privacy.
We propose PrivGen, an innovative method for generating data that maintains patterns and characteristics of the source data.
arXiv Detail & Related papers (2020-02-23T05:43:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.