Related papers: Label-Only Model Inversion Attacks via Boundary Repulsion

Label-Only Model Inversion Attacks via Boundary Repulsion

URL: http://arxiv.org/abs/2203.01925v1
Date: Thu, 3 Mar 2022 18:57:57 GMT
Title: Label-Only Model Inversion Attacks via Boundary Repulsion
Authors: Mostafa Kahla, Si Chen, Hoang Anh Just, Ruoxi Jia
Abstract summary: We introduce an algorithm to invert private training data using only the target model's predicted labels. Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data.
Score: 12.374249336222906
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent studies show that the state-of-the-art deep neural networks are vulnerable to model inversion attacks, in which access to a model is abused to reconstruct private training data of any given target class. Existing attacks rely on having access to either the complete target model (whitebox) or the model's soft-labels (blackbox). However, no prior work has been done in the harder but more practical scenario, in which the attacker only has access to the model's predicted label, without a confidence measure. In this paper, we introduce an algorithm, Boundary-Repelling Model Inversion (BREP-MI), to invert private training data using only the target model's predicted labels. The key idea of our algorithm is to evaluate the model's predicted labels over a sphere and then estimate the direction to reach the target class's centroid. Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data for various datasets and target model architectures. We compare BREP-MI with the state-of-the-art whitebox and blackbox model inversion attacks and the results show that despite assuming less knowledge about the target model, BREP-MI outperforms the blackbox attack and achieves comparable results to the whitebox attack.

Related papers

DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model [50.94236887900527]
We present a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset. We learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data.
arXiv Detail & Related papers (2024-12-08T07:37:05Z)
Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment [24.049615035939237]
Model inversion (MI) attack reconstructs the private training data of a target model given its output. We propose a novel Prediction-to-ImageP2I method for black-box MI attack. Our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA.
arXiv Detail & Related papers (2024-07-11T01:58:35Z)
DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model. We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z)
Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models. This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label. Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z)
Reinforcement Learning-Based Black-Box Model Inversion Attacks [23.30144908939099]
Model inversion attacks reconstruct private data used to train a machine learning model. White-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention. We propose a reinforcement learning-based black-box model inversion attack.
arXiv Detail & Related papers (2023-04-10T14:41:16Z)
Model Inversion Attacks against Graph Neural Networks [65.35955643325038]
We study model inversion attacks against Graph Neural Networks (GNNs) In this paper, we present GraphMI to infer the private training graph data. Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
arXiv Detail & Related papers (2022-09-16T09:13:43Z)
Label-only Model Inversion Attack: The Attack that Requires the Least Information [14.061083728194378]
In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output. We have found a model inversion method that can reconstruct the input data records based only on the output labels.
arXiv Detail & Related papers (2022-03-13T03:03:49Z)
Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process. The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters. We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data. Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.