Generalizable Black-Box Adversarial Attack with Meta Learning
- URL: http://arxiv.org/abs/2301.00364v1
- Date: Sun, 1 Jan 2023 07:24:12 GMT
- Title: Generalizable Black-Box Adversarial Attack with Meta Learning
- Authors: Fei Yin and Yong Zhang and Baoyuan Wu and Yan Feng and Jingyi Zhang
and Yanbo Fan and Yujiu Yang
- Abstract summary: In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget.
We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability.
The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
- Score: 54.196613395045595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the scenario of black-box adversarial attack, the target model's
parameters are unknown, and the attacker aims to find a successful adversarial
perturbation based on query feedback under a query budget. Due to the limited
feedback information, existing query-based black-box attack methods often
require many queries for attacking each benign example. To reduce query cost,
we propose to utilize the feedback information across historical attacks,
dubbed example-level adversarial transferability. Specifically, by treating the
attack on each benign example as one task, we develop a meta-learning framework
by training a meta-generator to produce perturbations conditioned on benign
examples. When attacking a new benign example, the meta generator can be
quickly fine-tuned based on the feedback information of the new task as well as
a few historical attacks to produce effective perturbations. Moreover, since
the meta-train procedure consumes many queries to learn a generalizable
generator, we utilize model-level adversarial transferability to train the
meta-generator on a white-box surrogate model, then transfer it to help the
attack against the target model. The proposed framework with the two types of
adversarial transferability can be naturally combined with any off-the-shelf
query-based attack methods to boost their performance, which is verified by
extensive experiments.
Related papers
- Learning diverse attacks on large language models for robust red-teaming and safety tuning [126.32539952157083]
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe deployment of large language models.
We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks.
We propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts.
arXiv Detail & Related papers (2024-05-28T19:16:17Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - DTA: Distribution Transform-based Attack for Query-Limited Scenario [11.874670564015789]
In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models.
This paper proposes a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries.
Experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art.
arXiv Detail & Related papers (2023-12-12T13:21:03Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems.
We propose a new attack on action recognition that addresses these shortcomings by generating perturbations.
Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z) - Local Black-box Adversarial Attacks: A Query Efficient Approach [64.98246858117476]
Adrial attacks have threatened the application of deep neural networks in security-sensitive scenarios.
We propose a novel framework to perturb the discriminative areas of clean examples only within limited queries in black-box attacks.
We conduct extensive experiments to show that our framework can significantly improve the query efficiency during black-box perturbing with a high attack success rate.
arXiv Detail & Related papers (2021-01-04T15:32:16Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.