Related papers: Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification?

Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification?

URL: http://arxiv.org/abs/2306.16581v1
Date: Wed, 28 Jun 2023 22:20:19 GMT
Title: Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification?
Authors: Ali Karkehabadi
Abstract summary: Black-box nature of Deep Neural Networks impedes a complete understanding of their inner workings. Online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem. We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep Neural Networks are powerful tools to understand complex patterns and making decisions. However, their black-box nature impedes a complete understanding of their inner workings. While online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem, it is still ambiguous if the visually explainable features align with robustness of the model against adversarial examples. In this paper, we investigate the saliency trained model's vulnerability to adversarial examples methods. Models are trained using an online saliency-guided training method and evaluated against popular algorithms of adversarial examples. We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.

Related papers

A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification [9.945272787814941]
We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness. Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions.
arXiv Detail & Related papers (2024-12-28T05:06:20Z)
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory. We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection [0.0]
Interpretability is as essential as robustness when we deploy the models to the real world. Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans.
arXiv Detail & Related papers (2023-07-04T13:51:55Z)
On the Properties of Adversarially-Trained CNNs [4.769747792846005]
Adversarial Training has proved to be an effective training paradigm to enforce robustness against adversarial examples in modern neural network architectures. We describe surprising properties of adversarially-trained models, shedding light on mechanisms through which robustness against adversarial attacks is implemented.
arXiv Detail & Related papers (2022-03-17T11:11:52Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model. We propose to exploit additional information from the feature space to craft stronger adversaries. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
On the Benefits of Models with Perceptually-Aligned Gradients [8.427953227125148]
We show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. We leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.
arXiv Detail & Related papers (2020-05-04T14:05:38Z)
Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes [51.31334977346847]
We train networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction. We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly.
arXiv Detail & Related papers (2020-04-01T09:31:10Z)
Regularizers for Single-step Adversarial Training [49.65499307547198]
We propose three types of regularizers that help to learn robust models using single-step adversarial training methods. Regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model.
arXiv Detail & Related papers (2020-02-03T09:21:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.