Orthogonal Deep Models As Defense Against Black-Box Attacks
- URL: http://arxiv.org/abs/2006.14856v1
- Date: Fri, 26 Jun 2020 08:29:05 GMT
- Title: Orthogonal Deep Models As Defense Against Black-Box Attacks
- Authors: Mohammad A. A. K. Jalwana, Naveed Akhtar, Mohammed Bennamoun, Ajmal
Mian
- Abstract summary: We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model.
We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another.
We verify the effectiveness of our technique on a variety of large-scale models.
- Score: 71.23669614195195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has demonstrated state-of-the-art performance for a variety of
challenging computer vision tasks. On one hand, this has enabled deep visual
models to pave the way for a plethora of critical applications like disease
prognostics and smart surveillance. On the other, deep learning has also been
found vulnerable to adversarial attacks, which calls for new techniques to
defend deep models against these attacks. Among the attack algorithms, the
black-box schemes are of serious practical concern since they only need
publicly available knowledge of the targeted model. We carefully analyze the
inherent weakness of deep models in black-box settings where the attacker may
develop the attack using a model similar to the targeted model. Based on our
analysis, we introduce a novel gradient regularization scheme that encourages
the internal representation of a deep model to be orthogonal to another, even
if the architectures of the two models are similar. Our unique constraint
allows a model to concomitantly endeavour for higher accuracy while maintaining
near orthogonal alignment of gradients with respect to a reference model.
Detailed empirical study verifies that controlled misalignment of gradients
under our orthogonality objective significantly boosts a model's robustness
against transferable black-box adversarial attacks. In comparison to regular
models, the orthogonal models are significantly more robust to a range of $l_p$
norm bounded perturbations. We verify the effectiveness of our technique on a
variety of large-scale models.
Related papers
- On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models [59.45628259925441]
Volumetric medical segmentation models have achieved significant success on organ and tumor-based segmentation tasks.
Their vulnerability to adversarial attacks remains largely unexplored.
This underscores the importance of investigating the robustness of existing models.
arXiv Detail & Related papers (2024-06-12T17:59:42Z) - Understanding Deep Learning defenses Against Adversarial Examples
Through Visualizations for Dynamic Risk Assessment [0.0]
Adversarial training, dimensionality reduction and prediction similarity were selected as defenses against adversarial example attack.
In each defense, the behavior of the original model has been compared with the behavior of the defended model, representing the target model by a graph in a visualization.
arXiv Detail & Related papers (2024-02-12T09:05:01Z) - Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable
Evasion Attacks [17.584752814352502]
Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data.
We introduce a self-supervised, computationally economical method for generating adversarial examples.
Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models.
arXiv Detail & Related papers (2023-10-05T17:34:47Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted
Attacks [30.863450425927613]
We study the black-box targeted attack problem from the model discrepancy perspective.
We present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack.
We derive a new algorithm for black-box targeted attacks based on our theoretical analysis.
arXiv Detail & Related papers (2022-12-18T08:19:08Z) - Scoring Black-Box Models for Adversarial Robustness [4.416484585765028]
robustness of models to adversarial attacks has been analyzed.
We propose a simple scoring method for black-box models which indicates their robustness to adversarial input.
arXiv Detail & Related papers (2022-10-31T08:41:44Z) - Training Meta-Surrogate Model for Transferable Adversarial Attack [98.13178217557193]
We consider adversarial attacks to a black-box model when no queries are allowed.
In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model.
We show we can obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models.
arXiv Detail & Related papers (2021-09-05T03:27:46Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.