Related papers: Understanding Deep Learning defenses Against Adversarial Examples Through Visualizations for Dynamic Risk Assessment

Understanding Deep Learning defenses Against Adversarial Examples Through Visualizations for Dynamic Risk Assessment

URL: http://arxiv.org/abs/2402.07496v1
Date: Mon, 12 Feb 2024 09:05:01 GMT
Title: Understanding Deep Learning defenses Against Adversarial Examples Through Visualizations for Dynamic Risk Assessment
Authors: Xabier Echeberria-Barrio, Amaia Gil-Lerchundi, Jon Egana-Zubia, Raul Orduna-Urrutia
Abstract summary: Adversarial training, dimensionality reduction and prediction similarity were selected as defenses against adversarial example attack. In each defense, the behavior of the original model has been compared with the behavior of the defended model, representing the target model by a graph in a visualization.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, Deep Neural Network models have been developed in different fields, where they have brought many advances. However, they have also started to be used in tasks where risk is critical. A misdiagnosis of these models can lead to serious accidents or even death. This concern has led to an interest among researchers to study possible attacks on these models, discovering a long list of vulnerabilities, from which every model should be defended. The adversarial example attack is a widely known attack among researchers, who have developed several defenses to avoid such a threat. However, these defenses are as opaque as a deep neural network model, how they work is still unknown. This is why visualizing how they change the behavior of the target model is interesting in order to understand more precisely how the performance of the defended model is being modified. For this work, some defenses, against adversarial example attack, have been selected in order to visualize the behavior modification of each of them in the defended model. Adversarial training, dimensionality reduction and prediction similarity were the selected defenses, which have been developed using a model composed by convolution neural network layers and dense neural network layers. In each defense, the behavior of the original model has been compared with the behavior of the defended model, representing the target model by a graph in a visualization.

Related papers

DUMB and DUMBer: Is Adversarial Training Worth It in the Real World? [15.469010487781931]
Adversarial examples are small and often imperceptible perturbations crafted to fool machine learning models.<n>Evasion attacks, a form of adversarial attack where input is modified at test time to cause misclassification, are particularly insidious due to their transferability.<n>We introduce DUMBer, an attack framework built on the foundation of the DUMB methodology to evaluate the resilience of adversarially trained models.
arXiv Detail & Related papers (2025-06-23T11:16:21Z)
Sustainable Self-evolution Adversarial Training [51.25767996364584]
We propose a Sustainable Self-Evolution Adversarial Training (SSEAT) framework for adversarial training defense models. We introduce a continual adversarial defense pipeline to realize learning from various kinds of adversarial examples. We also propose an adversarial data replay module to better select more diverse and key relearning data.
arXiv Detail & Related papers (2024-12-03T08:41:11Z)
Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs) We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries. Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z)
Topological safeguard for evasion attack interpreting the neural networks' behavior [0.0]
In this work, a novel detector of evasion attacks is developed. It focuses on the information of the activations of the neurons given by the model when an input sample is injected. For this purpose, a huge data preprocessing is required to introduce all this information in the detector.
arXiv Detail & Related papers (2024-02-12T08:39:40Z)
Investigating Human-Identifiable Features Hidden in Adversarial Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets. We identify human-identifiable features in adversarial perturbations. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z)
Deviations in Representations Induced by Adversarial Attacks [0.0]
Research has shown that deep learning models are vulnerable to adversarial attacks. This finding brought about a new direction in research, whereby algorithms were developed to attack and defend vulnerable networks. We present a method for measuring and analyzing the deviations in representations induced by adversarial attacks.
arXiv Detail & Related papers (2022-11-07T17:40:08Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Recent improvements of ASR models in the face of adversarial attacks [28.934863462633636]
Speech Recognition models are vulnerable to adversarial attacks. We show that the relative strengths of different attack algorithms vary considerably when changing the model architecture. We release our source code as a package that should help future research in evaluating their attacks and defenses.
arXiv Detail & Related papers (2022-03-29T22:40:37Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision. Existing defenses are trend to harden the robustness of models against adversarial attacks. We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z)
Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model. We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another. We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.