Label-Only Model Inversion Attacks via Boundary Repulsion
- URL: http://arxiv.org/abs/2203.01925v1
- Date: Thu, 3 Mar 2022 18:57:57 GMT
- Title: Label-Only Model Inversion Attacks via Boundary Repulsion
- Authors: Mostafa Kahla, Si Chen, Hoang Anh Just, Ruoxi Jia
- Abstract summary: We introduce an algorithm to invert private training data using only the target model's predicted labels.
Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data.
- Score: 12.374249336222906
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent studies show that the state-of-the-art deep neural networks are
vulnerable to model inversion attacks, in which access to a model is abused to
reconstruct private training data of any given target class. Existing attacks
rely on having access to either the complete target model (whitebox) or the
model's soft-labels (blackbox). However, no prior work has been done in the
harder but more practical scenario, in which the attacker only has access to
the model's predicted label, without a confidence measure. In this paper, we
introduce an algorithm, Boundary-Repelling Model Inversion (BREP-MI), to invert
private training data using only the target model's predicted labels. The key
idea of our algorithm is to evaluate the model's predicted labels over a sphere
and then estimate the direction to reach the target class's centroid. Using the
example of face recognition, we show that the images reconstructed by BREP-MI
successfully reproduce the semantics of the private training data for various
datasets and target model architectures. We compare BREP-MI with the
state-of-the-art whitebox and blackbox model inversion attacks and the results
show that despite assuming less knowledge about the target model, BREP-MI
outperforms the blackbox attack and achieves comparable results to the whitebox
attack.
Related papers
- Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment [24.049615035939237]
Model inversion (MI) attack reconstructs the private training data of a target model given its output.
We propose a novel Prediction-to-ImageP2I method for black-box MI attack.
Our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA.
arXiv Detail & Related papers (2024-07-11T01:58:35Z) - DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model.
We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z) - Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion
Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models.
This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label.
Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z) - Reinforcement Learning-Based Black-Box Model Inversion Attacks [23.30144908939099]
Model inversion attacks reconstruct private data used to train a machine learning model.
White-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention.
We propose a reinforcement learning-based black-box model inversion attack.
arXiv Detail & Related papers (2023-04-10T14:41:16Z) - Model Inversion Attacks against Graph Neural Networks [65.35955643325038]
We study model inversion attacks against Graph Neural Networks (GNNs)
In this paper, we present GraphMI to infer the private training graph data.
Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
arXiv Detail & Related papers (2022-09-16T09:13:43Z) - Label-only Model Inversion Attack: The Attack that Requires the Least
Information [14.061083728194378]
In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output.
We have found a model inversion method that can reconstruct the input data records based only on the output labels.
arXiv Detail & Related papers (2022-03-13T03:03:49Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.