MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective
- URL: http://arxiv.org/abs/2405.14981v2
- Date: Fri, 19 Jul 2024 16:10:00 GMT
- Title: MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective
- Authors: Yizhuo Chen, Chun-Fu Chen, Hsiang Hsu, Shaohan Hu, Marco Pistoia, Tarek Abdelzaher,
- Abstract summary: We propose a formal information-theoretic definition for this utility-preserving privacy protection problem.
We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets.
Results demonstrate the effectiveness and generalizability of our method under various configurations.
- Score: 10.009178591853058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing richness of large-scale datasets has been crucial in driving the rapid advancement and wide adoption of machine learning technologies. The massive collection and usage of data, however, pose an increasing risk for people's private and sensitive information due to either inadvertent mishandling or malicious exploitation. Besides legislative solutions, many technical approaches have been proposed towards data privacy protection. However, they bear various limitations such as leading to degraded data availability and utility, or relying on heuristics and lacking solid theoretical bases. To overcome these limitations, we propose a formal information-theoretic definition for this utility-preserving privacy protection problem, and design a data-driven learnable data transformation framework that is capable of selectively suppressing sensitive attributes from target datasets while preserving the other useful attributes, regardless of whether or not they are known in advance or explicitly annotated for preservation. We provide rigorous theoretical analyses on the operational bounds for our framework, and carry out comprehensive experimental evaluations using datasets of a variety of modalities, including facial images, voice audio clips, and human activity motion sensor signals. Results demonstrate the effectiveness and generalizability of our method under various configurations on a multitude of tasks. Our code is available at https://github.com/jpmorganchase/MaSS.
Related papers
- Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization [2.28438857884398]
We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy.
Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy.
The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
arXiv Detail & Related papers (2024-04-24T22:58:42Z) - Ungeneralizable Examples [70.76487163068109]
Current approaches to creating unlearnable data involve incorporating small, specially designed noises.
We extend the concept of unlearnable data to conditional data learnability and introduce textbfUntextbfGeneralizable textbfExamples (UGEs)
UGEs exhibit learnability for authorized users while maintaining unlearnability for potential hackers.
arXiv Detail & Related papers (2024-04-22T09:29:14Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - MaSS: Multi-attribute Selective Suppression [8.337285030303285]
We propose Multi-attribute Selective Suppression, or MaSS, a framework for performing precisely targeted data surgery.
MaSS learns a data modifier through adversarial games between two sets of networks, where one is aimed at suppressing selected attributes.
We carried out an extensive evaluation of our proposed method using multiple datasets from different domains.
arXiv Detail & Related papers (2022-10-18T14:44:08Z) - Towards a Data Privacy-Predictive Performance Trade-off [2.580765958706854]
We evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks.
Unlike previous literature, we confirm that the higher the level of privacy, the higher the impact on predictive performance.
arXiv Detail & Related papers (2022-01-13T21:48:51Z) - Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Privacy Enhancing Machine Learning via Removal of Unwanted Dependencies [21.97951347784442]
This paper studies new variants of supervised and adversarial learning methods, which remove the sensitive information in the data before they are sent out for a particular application.
The explored methods optimize privacy preserving feature mappings and predictive models simultaneously in an end-to-end fashion.
Experimental results on mobile sensing and face datasets demonstrate that our models can successfully maintain the utility performances of predictive models while causing sensitive predictions to perform poorly.
arXiv Detail & Related papers (2020-07-30T19:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.