On the Properties of Adversarially-Trained CNNs
- URL: http://arxiv.org/abs/2203.09243v1
- Date: Thu, 17 Mar 2022 11:11:52 GMT
- Title: On the Properties of Adversarially-Trained CNNs
- Authors: Mattia Carletti, Matteo Terzi, Gian Antonio Susto
- Abstract summary: Adversarial Training has proved to be an effective training paradigm to enforce robustness against adversarial examples in modern neural network architectures.
We describe surprising properties of adversarially-trained models, shedding light on mechanisms through which robustness against adversarial attacks is implemented.
- Score: 4.769747792846005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial Training has proved to be an effective training paradigm to
enforce robustness against adversarial examples in modern neural network
architectures. Despite many efforts, explanations of the foundational
principles underpinning the effectiveness of Adversarial Training are limited
and far from being widely accepted by the Deep Learning community. In this
paper, we describe surprising properties of adversarially-trained models,
shedding light on mechanisms through which robustness against adversarial
attacks is implemented. Moreover, we highlight limitations and failure modes
affecting these models that were not discussed by prior works. We conduct
extensive analyses on a wide range of architectures and datasets, performing a
deep comparison between robust and natural models.
Related papers
- Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory.
We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z) - Does Saliency-Based Training bring Robustness for Deep Neural Networks
in Image Classification? [0.0]
Black-box nature of Deep Neural Networks impedes a complete understanding of their inner workings.
Online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem.
We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.
arXiv Detail & Related papers (2023-06-28T22:20:19Z) - A Comprehensive Study on Robustness of Image Classification Models:
Benchmarking and Rethinking [54.89987482509155]
robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts.
We establish a comprehensive benchmark robustness called textbfARES-Bench on the image classification task.
By designing the training settings accordingly, we achieve the new state-of-the-art adversarial robustness.
arXiv Detail & Related papers (2023-02-28T04:26:20Z) - Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain.
In this work, we build models that rely on the message-passing rule of predictive coding.
We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Self-Ensemble Adversarial Training for Improved Robustness [14.244311026737666]
Adversarial training is the strongest strategy against various adversarial attacks among all sorts of defense methods.
Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space.
We devise a simple but powerful emphSelf-Ensemble Adversarial Training (SEAT) method for yielding a robust classifier by averaging weights of history models.
arXiv Detail & Related papers (2022-03-18T01:12:18Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Improving Adversarial Robustness by Enforcing Local and Global
Compactness [19.8818435601131]
Adversary training is the most successful method that consistently resists a wide range of attacks.
We propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption.
The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network.
arXiv Detail & Related papers (2020-07-10T00:43:06Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.