Adversarial Eigen Attack on Black-Box Models
- URL: http://arxiv.org/abs/2009.00097v1
- Date: Thu, 27 Aug 2020 07:37:43 GMT
- Title: Adversarial Eigen Attack on Black-Box Models
- Authors: Linjun Zhou, Peng Cui, Yinan Jiang, Shiqiang Yang
- Abstract summary: Black-box adversarial attack has attracted a lot of research interests for its practical use in AI safety.
A general way to improve the attack efficiency is to draw support from a pre-trained transferable white-box model.
In this paper, we propose a novel setting of transferable black-box attack: attackers may use external information from a pre-trained model with available network parameters.
- Score: 23.624958605512365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Black-box adversarial attack has attracted a lot of research interests for
its practical use in AI safety. Compared with the white-box attack, a black-box
setting is more difficult for less available information related to the
attacked model and the additional constraint on the query budget. A general way
to improve the attack efficiency is to draw support from a pre-trained
transferable white-box model. In this paper, we propose a novel setting of
transferable black-box attack: attackers may use external information from a
pre-trained model with available network parameters, however, different from
previous studies, no additional training data is permitted to further change or
tune the pre-trained model. To this end, we further propose a new algorithm,
EigenBA to tackle this problem. Our method aims to explore more gradient
information of the black-box model, and promote the attack efficiency, while
keeping the perturbation to the original attacked image small, by leveraging
the Jacobian matrix of the pre-trained white-box model. We show the optimal
perturbations are closely related to the right singular vectors of the Jacobian
matrix. Further experiments on ImageNet and CIFAR-10 show that even the
unlearnable pre-trained white-box model could also significantly boost the
efficiency of the black-box attack and our proposed method could further
improve the attack efficiency.
Related papers
- Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior [36.101904669291436]
This paper studies the challenging black-box adversarial attack that aims to generate examples against a black-box model by only using output feedback of the model to input queries.
We propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks.
Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior.
arXiv Detail & Related papers (2024-05-29T14:05:16Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Reinforcement Learning-Based Black-Box Model Inversion Attacks [23.30144908939099]
Model inversion attacks reconstruct private data used to train a machine learning model.
White-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention.
We propose a reinforcement learning-based black-box model inversion attack.
arXiv Detail & Related papers (2023-04-10T14:41:16Z) - Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems.
We propose a new attack on action recognition that addresses these shortcomings by generating perturbations.
Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z) - T-SEA: Transfer-based Self-Ensemble Attack on Object Detection [9.794192858806905]
We propose a single-model transfer-based black-box attack on object detection, utilizing only one model to achieve a high-transferability adversarial attack on multiple black-box detectors.
We analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch.
arXiv Detail & Related papers (2022-11-16T10:27:06Z) - How to Robustify Black-Box ML Models? A Zeroth-Order Optimization
Perspective [74.47093382436823]
We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback?
We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS)
We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
arXiv Detail & Related papers (2022-03-27T03:23:32Z) - Meta Gradient Adversarial Attack [64.5070788261061]
This paper proposes a novel architecture called Metaversa Gradient Adrial Attack (MGAA), which is plug-and-play and can be integrated with any existing gradient-based attack method.
Specifically, we randomly sample multiple models from a model zoo to compose different tasks and iteratively simulate a white-box attack and a black-box attack in each task.
By narrowing the gap between the gradient directions in white-box and black-box attacks, the transferability of adversarial examples on the black-box setting can be improved.
arXiv Detail & Related papers (2021-08-09T17:44:19Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data [96.92837098305898]
Black-box attacks aim to craft adversarial perturbations by querying input-output pairs of machine learning models.
Black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space.
We propose a novel technique called the spanning attack, which constrains adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset.
arXiv Detail & Related papers (2020-05-11T05:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.