Black-box Model Inversion Attribute Inference Attacks on Classification
Models
- URL: http://arxiv.org/abs/2012.03404v1
- Date: Mon, 7 Dec 2020 01:14:19 GMT
- Title: Black-box Model Inversion Attribute Inference Attacks on Classification
Models
- Authors: Shagufta Mehnaz, Ninghui Li, Elisa Bertino
- Abstract summary: We focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data.
We devise two novel model inversion attribute inference attacks -- confidence modeling-based attack and confidence score-based attack.
We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets.
- Score: 32.757792981935815
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Increasing use of ML technologies in privacy-sensitive domains such as
medical diagnoses, lifestyle predictions, and business decisions highlights the
need to better understand if these ML technologies are introducing leakages of
sensitive and proprietary training data. In this paper, we focus on one kind of
model inversion attacks, where the adversary knows non-sensitive attributes
about instances in the training data and aims to infer the value of a sensitive
attribute unknown to the adversary, using oracle access to the target
classification model. We devise two novel model inversion attribute inference
attacks -- confidence modeling-based attack and confidence score-based attack,
and also extend our attack to the case where some of the other (non-sensitive)
attributes are unknown to the adversary. Furthermore, while previous work uses
accuracy as the metric to evaluate the effectiveness of attribute inference
attacks, we find that accuracy is not informative when the sensitive attribute
distribution is unbalanced. We identify two metrics that are better for
evaluating attribute inference attacks, namely G-mean and Matthews correlation
coefficient (MCC). We evaluate our attacks on two types of machine learning
models, decision tree and deep neural network, trained with two real datasets.
Experimental results show that our newly proposed attacks significantly
outperform the state-of-the-art attacks. Moreover, we empirically show that
specific groups in the training dataset (grouped by attributes, e.g., gender,
race) could be more vulnerable to model inversion attacks. We also demonstrate
that our attacks' performances are not impacted significantly when some of the
other (non-sensitive) attributes are also unknown to the adversary.
Related papers
- When Machine Learning Models Leak: An Exploration of Synthetic Training Data [0.0]
We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years.
The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available.
We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes.
arXiv Detail & Related papers (2023-10-12T23:47:22Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - Purifier: Defending Data Inference Attacks via Transforming Confidence
Scores [27.330482508047428]
We propose a method, namely PURIFIER, to defend against membership inference attacks.
Experiments show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency.
PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks.
arXiv Detail & Related papers (2022-12-01T16:09:50Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Dikaios: Privacy Auditing of Algorithmic Fairness via Attribute
Inference Attacks [0.5801044612920815]
We propose Dikaios, a privacy auditing tool for fairness algorithms for model builders.
We show that our attribute inference attacks with adaptive prediction threshold significantly outperform prior attacks.
arXiv Detail & Related papers (2022-02-04T17:19:59Z) - Are Your Sensitive Attributes Private? Novel Model Inversion Attribute
Inference Attacks on Classification Models [22.569705869469814]
We focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data.
We devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art.
We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary.
arXiv Detail & Related papers (2022-01-23T21:27:20Z) - Enhanced Membership Inference Attacks against Machine Learning Models [9.26208227402571]
Membership inference attacks are used to quantify the private information that a model leaks about the individual data points in its training set.
We derive new attack algorithms that can achieve a high AUC score while also highlighting the different factors that affect their performance.
Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models.
arXiv Detail & Related papers (2021-11-18T13:31:22Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.