Understanding Deep Learning defenses Against Adversarial Examples
Through Visualizations for Dynamic Risk Assessment
- URL: http://arxiv.org/abs/2402.07496v1
- Date: Mon, 12 Feb 2024 09:05:01 GMT
- Title: Understanding Deep Learning defenses Against Adversarial Examples
Through Visualizations for Dynamic Risk Assessment
- Authors: Xabier Echeberria-Barrio, Amaia Gil-Lerchundi, Jon Egana-Zubia, Raul
Orduna-Urrutia
- Abstract summary: Adversarial training, dimensionality reduction and prediction similarity were selected as defenses against adversarial example attack.
In each defense, the behavior of the original model has been compared with the behavior of the defended model, representing the target model by a graph in a visualization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Deep Neural Network models have been developed in different
fields, where they have brought many advances. However, they have also started
to be used in tasks where risk is critical. A misdiagnosis of these models can
lead to serious accidents or even death. This concern has led to an interest
among researchers to study possible attacks on these models, discovering a long
list of vulnerabilities, from which every model should be defended. The
adversarial example attack is a widely known attack among researchers, who have
developed several defenses to avoid such a threat. However, these defenses are
as opaque as a deep neural network model, how they work is still unknown. This
is why visualizing how they change the behavior of the target model is
interesting in order to understand more precisely how the performance of the
defended model is being modified. For this work, some defenses, against
adversarial example attack, have been selected in order to visualize the
behavior modification of each of them in the defended model. Adversarial
training, dimensionality reduction and prediction similarity were the selected
defenses, which have been developed using a model composed by convolution
neural network layers and dense neural network layers. In each defense, the
behavior of the original model has been compared with the behavior of the
defended model, representing the target model by a graph in a visualization.
Related papers
- Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs)
We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries.
Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - Topological safeguard for evasion attack interpreting the neural
networks' behavior [0.0]
In this work, a novel detector of evasion attacks is developed.
It focuses on the information of the activations of the neurons given by the model when an input sample is injected.
For this purpose, a huge data preprocessing is required to introduce all this information in the detector.
arXiv Detail & Related papers (2024-02-12T08:39:40Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Deviations in Representations Induced by Adversarial Attacks [0.0]
Research has shown that deep learning models are vulnerable to adversarial attacks.
This finding brought about a new direction in research, whereby algorithms were developed to attack and defend vulnerable networks.
We present a method for measuring and analyzing the deviations in representations induced by adversarial attacks.
arXiv Detail & Related papers (2022-11-07T17:40:08Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Recent improvements of ASR models in the face of adversarial attacks [28.934863462633636]
Speech Recognition models are vulnerable to adversarial attacks.
We show that the relative strengths of different attack algorithms vary considerably when changing the model architecture.
We release our source code as a package that should help future research in evaluating their attacks and defenses.
arXiv Detail & Related papers (2022-03-29T22:40:37Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision.
Existing defenses are trend to harden the robustness of models against adversarial attacks.
We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z) - Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model.
We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another.
We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.