How to Steer Your Adversary: Targeted and Efficient Model Stealing
Defenses with Gradient Redirection
- URL: http://arxiv.org/abs/2206.14157v1
- Date: Tue, 28 Jun 2022 17:04:49 GMT
- Title: How to Steer Your Adversary: Targeted and Efficient Model Stealing
Defenses with Gradient Redirection
- Authors: Mantas Mazeika, Bo Li, David Forsyth
- Abstract summary: We present a new approach to model stealing defenses called gradient redirection.
At the core of our approach is a provably optimal, efficient algorithm for steering an adversary's training updates in a targeted manner.
Combined with improvements to surrogate networks and a novel coordinated defense strategy, our gradient redirection defense, called GRAD$2$, achieves small utility trade-offs and low computational overhead.
- Score: 16.88718696087103
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Model stealing attacks present a dilemma for public machine learning APIs. To
protect financial investments, companies may be forced to withhold important
information about their models that could facilitate theft, including
uncertainty estimates and prediction explanations. This compromise is harmful
not only to users but also to external transparency. Model stealing defenses
seek to resolve this dilemma by making models harder to steal while preserving
utility for benign users. However, existing defenses have poor performance in
practice, either requiring enormous computational overheads or severe utility
trade-offs. To meet these challenges, we present a new approach to model
stealing defenses called gradient redirection. At the core of our approach is a
provably optimal, efficient algorithm for steering an adversary's training
updates in a targeted manner. Combined with improvements to surrogate networks
and a novel coordinated defense strategy, our gradient redirection defense,
called GRAD${}^2$, achieves small utility trade-offs and low computational
overhead, outperforming the best prior defenses. Moreover, we demonstrate how
gradient redirection enables reprogramming the adversary with arbitrary
behavior, which we hope will foster work on new avenues of defense.
Related papers
- Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game [28.33029508522531]
Malicious attackers induce large models to jailbreak and generate information containing illegal, privacy-invasive information.
Large models counter malicious attackers' attacks using techniques such as safety alignment.
We propose a multi-agent attacker-disguiser game approach to achieve a weak defense mechanism that allows the large model to both safely reply to the attacker and hide the defense intent.
arXiv Detail & Related papers (2024-04-03T07:43:11Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Efficient Defense Against Model Stealing Attacks on Convolutional Neural
Networks [0.548924822963045]
Model stealing attacks can lead to intellectual property theft and other security and privacy risks.
Current state-of-the-art defenses against model stealing attacks suggest adding perturbations to the prediction probabilities.
We propose a simple yet effective and efficient defense alternative.
arXiv Detail & Related papers (2023-09-04T22:25:49Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - I Know What You Trained Last Summer: A Survey on Stealing Machine
Learning Models and Defences [0.1031296820074812]
We study model stealing attacks, assessing their performance and exploring corresponding defence techniques in different settings.
We propose a taxonomy for attack and defence approaches, and provide guidelines on how to select the right attack or defence based on the goal and available resources.
arXiv Detail & Related papers (2022-06-16T21:16:41Z) - Defense Against Gradient Leakage Attacks via Learning to Obscure Data [48.67836599050032]
Federated learning is considered as an effective privacy-preserving learning mechanism.
In this paper, we propose a new defense method to protect the privacy of clients' data by learning to obscure data.
arXiv Detail & Related papers (2022-06-01T21:03:28Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Opportunities and Challenges in Deep Learning Adversarial Robustness: A
Survey [1.8782750537161614]
This paper studies strategies to implement adversary robustly trained algorithms towards guaranteeing safety in machine learning algorithms.
We provide a taxonomy to classify adversarial attacks and defenses, formulate the Robust Optimization problem in a min-max setting, and divide it into 3 subcategories, namely: Adversarial (re)Training, Regularization Approach, and Certified Defenses.
arXiv Detail & Related papers (2020-07-01T21:00:32Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.