MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient
Estimation
- URL: http://arxiv.org/abs/2005.03161v2
- Date: Fri, 28 Oct 2022 21:08:42 GMT
- Title: MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient
Estimation
- Authors: Sanjay Kariyappa, Atul Prakash, Moinuddin Qureshi
- Abstract summary: Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model.
This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation.
In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model.
- Score: 14.544507965617582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model Stealing (MS) attacks allow an adversary with black-box access to a
Machine Learning model to replicate its functionality, compromising the
confidentiality of the model. Such attacks train a clone model by using the
predictions of the target model for different inputs. The effectiveness of such
attacks relies heavily on the availability of data necessary to query the
target model. Existing attacks either assume partial access to the dataset of
the target model or availability of an alternate dataset with semantic
similarities. This paper proposes MAZE -- a data-free model stealing attack
using zeroth-order gradient estimation. In contrast to prior works, MAZE does
not require any data and instead creates synthetic data using a generative
model. Inspired by recent works in data-free Knowledge Distillation (KD), we
train the generative model using a disagreement objective to produce inputs
that maximize disagreement between the clone and the target model. However,
unlike the white-box setting of KD, where the gradient information is
available, training a generator for model stealing requires performing
black-box optimization, as it involves accessing the target model under attack.
MAZE relies on zeroth-order gradient estimation to perform this optimization
and enables a highly accurate MS attack. Our evaluation with four datasets
shows that MAZE provides a normalized clone accuracy in the range of 0.91x to
0.99x, and outperforms even the recent attacks that rely on partial data (JBDA,
clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy
0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting
and develop MAZE-PD, which generates synthetic data closer to the target
distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and
reduces the query required for the attack by 2x-24x.
Related papers
- Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment [24.049615035939237]
Model inversion (MI) attack reconstructs the private training data of a target model given its output.
We propose a novel Prediction-to-ImageP2I method for black-box MI attack.
Our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA.
arXiv Detail & Related papers (2024-07-11T01:58:35Z) - Incremental Pseudo-Labeling for Black-Box Unsupervised Domain Adaptation [14.596659424489223]
We propose a novel approach that incrementally selects high-confidence pseudo-labels to improve the generalization ability of the target model.
Experimental results demonstrate that the proposed method achieves state-of-the-art black-box unsupervised domain adaptation performance on three benchmark datasets.
arXiv Detail & Related papers (2024-05-26T05:41:42Z) - Data-Free Hard-Label Robustness Stealing Attack [67.41281050467889]
We introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper.
It enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model.
Our method achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack.
arXiv Detail & Related papers (2023-12-10T16:14:02Z) - Back to the Source: Diffusion-Driven Test-Time Adaptation [77.4229736436935]
Test-time adaptation harnesses test inputs to improve accuracy of a model trained on source data when tested on shifted target data.
We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.
arXiv Detail & Related papers (2022-07-07T17:14:10Z) - How to Robustify Black-Box ML Models? A Zeroth-Order Optimization
Perspective [74.47093382436823]
We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback?
We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS)
We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
arXiv Detail & Related papers (2022-03-27T03:23:32Z) - Label-only Model Inversion Attack: The Attack that Requires the Least
Information [14.061083728194378]
In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output.
We have found a model inversion method that can reconstruct the input data records based only on the output labels.
arXiv Detail & Related papers (2022-03-13T03:03:49Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - MEGEX: Data-Free Model Extraction Attack against Gradient-Based
Explainable AI [1.693045612956149]
Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks.
A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions.
In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
arXiv Detail & Related papers (2021-07-19T14:25:06Z) - Data-Free Model Extraction [16.007030173299984]
Current model extraction attacks assume that the adversary has access to a surrogate dataset with characteristics similar to the proprietary data used to train the victim model.
We propose data-free model extraction methods that do not require a surrogate dataset.
We find that the proposed data-free model extraction approach achieves high-accuracy with reasonable query complexity.
arXiv Detail & Related papers (2020-11-30T13:37:47Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.