Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks
- URL: http://arxiv.org/abs/2310.00567v1
- Date: Sun, 1 Oct 2023 03:53:23 GMT
- Title: Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks
- Authors: Quang H. Nguyen, Yingjie Lao, Tung Pham, Kok-Seng Wong, Khoa D. Doan
- Abstract summary: Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
- Score: 23.010308600769545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works have shown that deep neural networks are vulnerable to
adversarial examples that find samples close to the original image but can make
the model misclassify. Even with access only to the model's output, an attacker
can employ black-box attacks to generate such adversarial examples. In this
work, we propose a simple and lightweight defense against black-box attacks by
adding random noise to hidden features at intermediate layers of the model at
inference time. Our theoretical analysis confirms that this method effectively
enhances the model's resilience against both score-based and decision-based
black-box attacks. Importantly, our defense does not necessitate adversarial
training and has minimal impact on accuracy, rendering it applicable to any
pre-trained model. Our analysis also reveals the significance of selectively
adding noise to different parts of the model based on the gradient of the
adversarial objective function, which can be varied during the attack. We
demonstrate the robustness of our defense against multiple black-box attacks
through extensive empirical experiments involving diverse models with various
architectures.
Related papers
- BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack [22.408968332454062]
We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries.
We develop the BruSLeAttack-a new, faster (more query-efficient) algorithm for the problem.
Our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.
arXiv Detail & Related papers (2024-04-08T08:59:26Z) - Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence [34.35162562625252]
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models.
We study a new paradigm of black-box attacks with provable guarantees.
This new black-box attack unveils significant vulnerabilities of machine learning models.
arXiv Detail & Related papers (2023-04-10T01:12:09Z) - Scoring Black-Box Models for Adversarial Robustness [4.416484585765028]
robustness of models to adversarial attacks has been analyzed.
We propose a simple scoring method for black-box models which indicates their robustness to adversarial input.
arXiv Detail & Related papers (2022-10-31T08:41:44Z) - Pixle: a fast and effective black-box attack based on rearranging pixels [15.705568893476947]
Black-box adversarial attacks can be performed without knowing the inner structure of the attacked model.
We propose a novel attack that is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image.
We demonstrate that our attack works on a large number of datasets and models, that it requires a small number of iterations, and that the distance between the original sample and the adversarial one is negligible to the human eye.
arXiv Detail & Related papers (2022-02-04T17:03:32Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack)
NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model.
We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another.
We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z) - Luring of transferable adversarial perturbations in the black-box
paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks.
A removable additional neural network is included in the target model, and is designed to induce the textitluring effect.
Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.