MEGEX: Data-Free Model Extraction Attack against Gradient-Based
Explainable AI
- URL: http://arxiv.org/abs/2107.08909v1
- Date: Mon, 19 Jul 2021 14:25:06 GMT
- Title: MEGEX: Data-Free Model Extraction Attack against Gradient-Based
Explainable AI
- Authors: Takayuki Miura, Satoshi Hasegawa, Toshiki Shibahara
- Abstract summary: Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks.
A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions.
In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
- Score: 1.693045612956149
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advance of explainable artificial intelligence, which provides reasons
for its predictions, is expected to accelerate the use of deep neural networks
in the real world like Machine Learning as a Service (MLaaS) that returns
predictions on queried data with the trained model. Deep neural networks
deployed in MLaaS face the threat of model extraction attacks. A model
extraction attack is an attack to violate intellectual property and privacy in
which an adversary steals trained models in a cloud using only their
predictions. In particular, a data-free model extraction attack has been
proposed recently and is more critical. In this attack, an adversary uses a
generative model instead of preparing input data. The feasibility of this
attack, however, needs to be studied since it requires more queries than that
with surrogate datasets. In this paper, we propose MEGEX, a data-free model
extraction attack against a gradient-based explainable AI. In this method, an
adversary uses the explanations to train the generative model and reduces the
number of queries to steal the model. Our experiments show that our proposed
method reconstructs high-accuracy models -- 0.97$\times$ and 0.98$\times$ the
victim model accuracy on SVHN and CIFAR-10 datasets given 2M and 20M queries,
respectively. This implies that there is a trade-off between the
interpretability of models and the difficulty of stealing them.
Related papers
- MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction [0.8437187555622164]
"MisGUIDE" is a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process.
The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries.
arXiv Detail & Related papers (2024-03-27T13:59:21Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Careful What You Wish For: on the Extraction of Adversarially Trained
Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - Black-box Adversarial Attacks on Network-wide Multi-step Traffic State
Prediction Models [4.353029347463806]
We propose an adversarial attack framework by treating the prediction model as a black-box.
The adversary can oracle the prediction model with any input and obtain corresponding output.
To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined.
arXiv Detail & Related papers (2021-10-17T03:45:35Z) - Probing Model Signal-Awareness via Prediction-Preserving Input
Minimization [67.62847721118142]
We evaluate models' ability to capture the correct vulnerability signals to produce their predictions.
We measure the signal awareness of models using a new metric we propose- Signal-aware Recall (SAR)
The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric.
arXiv Detail & Related papers (2020-11-25T20:05:23Z) - DaST: Data-free Substitute Training for Adversarial Attacks [55.76371274622313]
We propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks.
To achieve this, DaST utilizes specially designed generative adversarial networks (GANs) to train the substitute models.
Experiments demonstrate the substitute models can achieve competitive performance compared with the baseline models.
arXiv Detail & Related papers (2020-03-28T04:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.