Careful What You Wish For: on the Extraction of Adversarially Trained
Models
- URL: http://arxiv.org/abs/2207.10561v1
- Date: Thu, 21 Jul 2022 16:04:37 GMT
- Title: Careful What You Wish For: on the Extraction of Adversarially Trained
Models
- Authors: Kacem Khaled, Gabriela Nicolescu and Felipe Gohring de Magalh\~aes
- Abstract summary: Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
- Score: 2.707154152696381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent attacks on Machine Learning (ML) models such as evasion attacks with
adversarial examples and models stealing through extraction attacks pose
several security and privacy threats. Prior work proposes to use adversarial
training to secure models from adversarial examples that can evade the
classification of a model and deteriorate its performance. However, this
protection technique affects the model's decision boundary and its prediction
probabilities, hence it might raise model privacy risks. In fact, a malicious
user using only a query access to the prediction output of a model can extract
it and obtain a high-accuracy and high-fidelity surrogate model. To have a
greater extraction, these attacks leverage the prediction probabilities of the
victim model. Indeed, all previous work on extraction attacks do not take into
consideration the changes in the training process for security purposes. In
this paper, we propose a framework to assess extraction attacks on
adversarially trained models with vision datasets. To the best of our
knowledge, our work is the first to perform such evaluation. Through an
extensive empirical study, we demonstrate that adversarially trained models are
more vulnerable to extraction attacks than models obtained under natural
training circumstances. They can achieve up to $\times1.2$ higher accuracy and
agreement with a fraction lower than $\times0.75$ of the queries. We
additionally find that the adversarial robustness capability is transferable
through extraction attacks, i.e., extracted Deep Neural Networks (DNNs) from
robust models show an enhanced accuracy to adversarial examples compared to
extracted DNNs from naturally trained (i.e. standard) models.
Related papers
- Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - MEAOD: Model Extraction Attack against Object Detectors [45.817537875368956]
Model extraction attacks allow attackers to replicate a substitute model with comparable functionality to the victim model.
We propose an effective attack method called MEAOD for object detection models.
We achieve an extraction performance of over 70% under the given condition of a 10k query budget.
arXiv Detail & Related papers (2023-12-22T13:28:50Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Black-box Adversarial Attacks on Network-wide Multi-step Traffic State
Prediction Models [4.353029347463806]
We propose an adversarial attack framework by treating the prediction model as a black-box.
The adversary can oracle the prediction model with any input and obtain corresponding output.
To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined.
arXiv Detail & Related papers (2021-10-17T03:45:35Z) - MEGEX: Data-Free Model Extraction Attack against Gradient-Based
Explainable AI [1.693045612956149]
Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks.
A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions.
In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
arXiv Detail & Related papers (2021-07-19T14:25:06Z) - Thief, Beware of What Get You There: Towards Understanding Model
Extraction Attack [13.28881502612207]
In some scenarios, AI models are trained proprietarily, where neither pre-trained models nor sufficient in-distribution data is publicly available.
We find the effectiveness of existing techniques significantly affected by the absence of pre-trained models.
We formulate model extraction attacks into an adaptive framework that captures these factors with deep reinforcement learning.
arXiv Detail & Related papers (2021-04-13T03:46:59Z) - Model Extraction and Defenses on Generative Adversarial Networks [0.9442139459221782]
We study the feasibility of model extraction attacks against generative adversarial networks (GANs)
We propose effective defense techniques to safeguard GANs, considering a trade-off between the utility and security of GAN models.
arXiv Detail & Related papers (2021-01-06T14:36:21Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.