Related papers: PubDef: Defending Against Transfer Attacks From Public Models

PubDef: Defending Against Transfer Attacks From Public Models

URL: http://arxiv.org/abs/2310.17645v2
Date: Sun, 17 Mar 2024 04:40:48 GMT
Title: PubDef: Defending Against Transfer Attacks From Public Models
Authors: Chawin Sitawarin, Jaewon Chang, David Huang, Wesson Altoyan, David Wagner,
Abstract summary: We propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models. We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy.
Score: 6.0012551318569285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial attacks have been a looming and unaddressed threat in the industry. However, through a decade-long history of the robustness evaluation literature, we have learned that mounting a strong or optimal attack is challenging. It requires both machine learning and domain expertise. In other words, the white-box threat model, religiously assumed by a large majority of the past literature, is unrealistic. In this paper, we propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models. We argue that this setting will become the most prevalent for security-sensitive applications in the future. We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. The defenses are evaluated under 24 public models and 11 attack algorithms across three datasets (CIFAR-10, CIFAR-100, and ImageNet). Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy. For instance, on ImageNet, our defense achieves 62% accuracy under the strongest transfer attack vs only 36% of the best adversarially trained model. Its accuracy when not under attack is only 2% lower than that of an undefended model (78% vs 80%). We release our code at https://github.com/wagner-group/pubdef.

Related papers

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections [74.60337113759313]
Current defenses against jailbreaks and prompt injections are typically evaluated against a static set of harmful attack strings.<n>We argue that this evaluation process is flawed. Instead, we should evaluate defenses against adaptive attackers who explicitly modify their attack strategy to counter a defense's design.
arXiv Detail & Related papers (2025-10-10T05:51:04Z)
Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust [65.95797963483729]
Ensemble everything everywhere is a defense to adversarial examples. We show that this defense is not robust to adversarial attack. We then use standard adaptive attack techniques to reduce the defense's robust accuracy.
arXiv Detail & Related papers (2024-11-22T10:17:32Z)
Versatile Defense Against Adversarial Attacks on Image Recognition [2.9980620769521513]
Defending against adversarial attacks in a real-life setting can be compared to the way antivirus software works. It appears that a defense method based on image-to-image translation may be capable of this. The trained model has successfully improved the classification accuracy from nearly zero to an average of 86%.
arXiv Detail & Related papers (2024-03-13T01:48:01Z)
MultiRobustBench: Benchmarking Robustness Against Multiple Attacks [86.70417016955459]
We present the first unified framework for considering multiple attacks against machine learning (ML) models. Our framework is able to model different levels of learner's knowledge about the test-time adversary. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types.
arXiv Detail & Related papers (2023-02-21T20:26:39Z)
Adversarial Transfer Attacks With Unknown Data and Class Overlap [19.901933940805684]
Current transfer attack research has an unrealistic advantage for the attacker. We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings. This threat model is relevant to applications in medicine, malware, and others.
arXiv Detail & Related papers (2021-09-23T03:41:34Z)
Attacking Adversarial Attacks as A Defense [40.8739589617252]
adversarial attacks can fool deep neural networks with imperceptible perturbations. On adversarially-trained models, perturbing adversarial examples with a small random noise may invalidate their misled predictions. We propose to counter attacks by crafting more effective defensive perturbations.
arXiv Detail & Related papers (2021-06-09T09:31:10Z)
Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks [72.59081183040682]
We propose dynamic defenses, to adapt the model and input during testing, by defensive entropy minimization (dent) dent improves the robustness of adversarially-trained defenses and nominally-trained models against white-box, black-box, and adaptive attacks on CIFAR-10/100 and ImageNet.
arXiv Detail & Related papers (2021-05-18T17:55:07Z)
Lagrangian Objective Function Leads to Improved Unforeseen Attack Generalization in Adversarial Training [0.0]
Adversarial training (AT) has been shown effective to reach a robust model against the attack that is used during training. We propose a simple modification to the AT that mitigates the mentioned issue. We show that our attack is faster than other attack schemes that are designed for unseen attack generalization.
arXiv Detail & Related papers (2021-03-29T07:23:46Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency. RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z)
Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models. Current substitute attacks need pre-trained models to generate adversarial examples. In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.