Related papers: Careful What You Wish For: on the Extraction of Adversarially Trained Models

Careful What You Wish For: on the Extraction of Adversarially Trained Models

URL: http://arxiv.org/abs/2207.10561v1
Date: Thu, 21 Jul 2022 16:04:37 GMT
Title: Careful What You Wish For: on the Extraction of Adversarially Trained Models
Authors: Kacem Khaled, Gabriela Nicolescu and Felipe Gohring de Magalh\~aes
Abstract summary: Recent attacks on Machine Learning (ML) models pose several security and privacy threats. We propose a framework to assess extraction attacks on adversarially trained models. We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
Score: 2.707154152696381
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent attacks on Machine Learning (ML) models such as evasion attacks with adversarial examples and models stealing through extraction attacks pose several security and privacy threats. Prior work proposes to use adversarial training to secure models from adversarial examples that can evade the classification of a model and deteriorate its performance. However, this protection technique affects the model's decision boundary and its prediction probabilities, hence it might raise model privacy risks. In fact, a malicious user using only a query access to the prediction output of a model can extract it and obtain a high-accuracy and high-fidelity surrogate model. To have a greater extraction, these attacks leverage the prediction probabilities of the victim model. Indeed, all previous work on extraction attacks do not take into consideration the changes in the training process for security purposes. In this paper, we propose a framework to assess extraction attacks on adversarially trained models with vision datasets. To the best of our knowledge, our work is the first to perform such evaluation. Through an extensive empirical study, we demonstrate that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances. They can achieve up to $\times1.2$ higher accuracy and agreement with a fraction lower than $\times0.75$ of the queries. We additionally find that the adversarial robustness capability is transferable through extraction attacks, i.e., extracted Deep Neural Networks (DNNs) from robust models show an enhanced accuracy to adversarial examples compared to extracted DNNs from naturally trained (i.e. standard) models.

Related papers

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
MEAOD: Model Extraction Attack against Object Detectors [45.817537875368956]
Model extraction attacks allow attackers to replicate a substitute model with comparable functionality to the victim model. We propose an effective attack method called MEAOD for object detection models. We achieve an extraction performance of over 70% under the given condition of a 10k query budget.
arXiv Detail & Related papers (2023-12-22T13:28:50Z)
Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access. We investigate factors influencing the success of model extraction attacks. Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
MOVE: Effective and Harmless Ownership Verification via Embedded External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously. We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z)
Black-box Adversarial Attacks on Network-wide Multi-step Traffic State Prediction Models [4.353029347463806]
We propose an adversarial attack framework by treating the prediction model as a black-box. The adversary can oracle the prediction model with any input and obtain corresponding output. To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined.
arXiv Detail & Related papers (2021-10-17T03:45:35Z)
MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI [1.693045612956149]
Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks. A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions. In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
arXiv Detail & Related papers (2021-07-19T14:25:06Z)
Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack [13.28881502612207]
In some scenarios, AI models are trained proprietarily, where neither pre-trained models nor sufficient in-distribution data is publicly available. We find the effectiveness of existing techniques significantly affected by the absence of pre-trained models. We formulate model extraction attacks into an adaptive framework that captures these factors with deep reinforcement learning.
arXiv Detail & Related papers (2021-04-13T03:46:59Z)
Model Extraction and Defenses on Generative Adversarial Networks [0.9442139459221782]
We study the feasibility of model extraction attacks against generative adversarial networks (GANs) We propose effective defense techniques to safeguard GANs, considering a trade-off between the utility and security of GAN models.
arXiv Detail & Related papers (2021-01-06T14:36:21Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.