Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks
- URL: http://arxiv.org/abs/2308.00958v2
- Date: Thu, 3 Aug 2023 06:27:08 GMT
- Title: Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks
- Authors: Jun Guo, Aishan Liu, Xingyu Zheng, Siyuan Liang, Yisong Xiao, Yichao
Wu, Xianglong Liu
- Abstract summary: Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
- Score: 51.51023951695014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the broad application of Machine Learning models as a Service
(MLaaS), they are vulnerable to model stealing attacks. These attacks can
replicate the model functionality by using the black-box query process without
any prior knowledge of the target victim model. Existing stealing defenses add
deceptive perturbations to the victim's posterior probabilities to mislead the
attackers. However, these defenses are now suffering problems of high inference
computational overheads and unfavorable trade-offs between benign accuracy and
stealing robustness, which challenges the feasibility of deployed models in
practice. To address the problems, this paper proposes Isolation and Induction
(InI), a novel and effective training framework for model stealing defenses.
Instead of deploying auxiliary defense modules that introduce redundant
inference time, InI directly trains a defensive model by isolating the
adversary's training gradient from the expected gradient, which can effectively
reduce the inference computational cost. In contrast to adding perturbations
over model predictions that harm the benign accuracy, we train models to
produce uninformative outputs against stealing queries, which can induce the
adversary to extract little useful knowledge from victim models with minimal
impact on the benign performance. Extensive experiments on several visual
classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior
robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x
faster) of our InI over other state-of-the-art methods. Our codes can be found
in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.
Related papers
- Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Efficient Defense Against Model Stealing Attacks on Convolutional Neural
Networks [0.548924822963045]
Model stealing attacks can lead to intellectual property theft and other security and privacy risks.
Current state-of-the-art defenses against model stealing attacks suggest adding perturbations to the prediction probabilities.
We propose a simple yet effective and efficient defense alternative.
arXiv Detail & Related papers (2023-09-04T22:25:49Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Careful What You Wish For: on the Extraction of Adversarially Trained
Models [2.707154152696381]
Recent attacks on Machine Learning (ML) models pose several security and privacy threats.
We propose a framework to assess extraction attacks on adversarially trained models.
We show that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances.
arXiv Detail & Related papers (2022-07-21T16:04:37Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Defending against Model Stealing via Verifying Embedded External
Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures.
We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features.
Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z) - MEGEX: Data-Free Model Extraction Attack against Gradient-Based
Explainable AI [1.693045612956149]
Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks.
A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions.
In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
arXiv Detail & Related papers (2021-07-19T14:25:06Z) - Improving Robustness to Model Inversion Attacks via Mutual Information
Regularization [12.079281416410227]
This paper studies defense mechanisms against model inversion (MI) attacks.
MI is a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model.
We propose the Mutual Information Regularization based Defense (MID) against MI attacks.
arXiv Detail & Related papers (2020-09-11T06:02:44Z) - Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised
Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks.
Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.