MaSS: Multi-attribute Selective Suppression
- URL: http://arxiv.org/abs/2210.09904v2
- Date: Mon, 24 Oct 2022 13:41:06 GMT
- Title: MaSS: Multi-attribute Selective Suppression
- Authors: Chun-Fu Chen, Shaohan Hu, Zhonghao Shi, Prateek Gulati, Bill Moriarty,
Marco Pistoia, Vincenzo Piuri, Pierangela Samarati
- Abstract summary: We propose Multi-attribute Selective Suppression, or MaSS, a framework for performing precisely targeted data surgery.
MaSS learns a data modifier through adversarial games between two sets of networks, where one is aimed at suppressing selected attributes.
We carried out an extensive evaluation of our proposed method using multiple datasets from different domains.
- Score: 8.337285030303285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent rapid advances in machine learning technologies largely depend on
the vast richness of data available today, in terms of both the quantity and
the rich content contained within. For example, biometric data such as images
and voices could reveal people's attributes like age, gender, sentiment, and
origin, whereas location/motion data could be used to infer people's activity
levels, transportation modes, and life habits. Along with the new services and
applications enabled by such technological advances, various governmental
policies are put in place to regulate such data usage and protect people's
privacy and rights. As a result, data owners often opt for simple data
obfuscation (e.g., blur people's faces in images) or withholding data
altogether, which leads to severe data quality degradation and greatly limits
the data's potential utility.
Aiming for a sophisticated mechanism which gives data owners fine-grained
control while retaining the maximal degree of data utility, we propose
Multi-attribute Selective Suppression, or MaSS, a general framework for
performing precisely targeted data surgery to simultaneously suppress any
selected set of attributes while preserving the rest for downstream machine
learning tasks. MaSS learns a data modifier through adversarial games between
two sets of networks, where one is aimed at suppressing selected attributes,
and the other ensures the retention of the rest of the attributes via general
contrastive loss as well as explicit classification metrics. We carried out an
extensive evaluation of our proposed method using multiple datasets from
different domains including facial images, voice audio, and video clips, and
obtained promising results in MaSS' generalizability and capability of
suppressing targeted attributes without negatively affecting the data's
usability in other downstream ML tasks.
Related papers
- MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective [10.009178591853058]
We propose a formal information-theoretic definition for this utility-preserving privacy protection problem.
We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets.
Results demonstrate the effectiveness and generalizability of our method under various configurations.
arXiv Detail & Related papers (2024-05-23T18:35:46Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Delete My Account: Impact of Data Deletion on Machine Learning
Classifiers [0.0]
The right to erasure has potential implications for a number of different fields, such as big data and machine learning.
Our paper presents an in-depth analysis about the impact of the use of the right to erasure on the performance of machine learning models.
arXiv Detail & Related papers (2023-11-17T08:23:17Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Uncertainty-Autoencoder-Based Privacy and Utility Preserving Data Type
Conscious Transformation [3.7315964084413173]
We propose an adversarial learning framework that deals with the privacy-utility tradeoff problem under two conditions.
Under data-type ignorant conditions, the privacy mechanism provides a one-hot encoding of categorical features, representing exactly one class.
Under data-type aware conditions, the categorical variables are represented by a collection of scores, one for each class.
arXiv Detail & Related papers (2022-05-04T08:40:15Z) - Towards a Data Privacy-Predictive Performance Trade-off [2.580765958706854]
We evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks.
Unlike previous literature, we confirm that the higher the level of privacy, the higher the impact on predictive performance.
arXiv Detail & Related papers (2022-01-13T21:48:51Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z) - Differentially Private M-band Wavelet-Based Mechanisms in Machine
Learning Environments [4.629162607975834]
We develop three privacy-preserving mechanisms with the discrete M-band wavelet transform that embed noise into data.
We show that our mechanisms successfully retain both differential privacy and learnability through statistical analysis in various machine learning environments.
arXiv Detail & Related papers (2019-12-30T18:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.