Related papers: Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

URL: http://arxiv.org/abs/2307.02500v2
Date: Sun, 19 Nov 2023 15:38:50 GMT
Title: Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection
Authors: Delyan Boychev
Abstract summary: Interpretability is as essential as robustness when we deploy the models to the real world. Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.

Related papers

A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification [9.945272787814941]
We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness. Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions.
arXiv Detail & Related papers (2024-12-28T05:06:20Z)
Towards Evaluating the Robustness of Visual State Space Models [63.14954591606638]
Vision State Space Models (VSSMs) have demonstrated remarkable performance in visual perception tasks. However, their robustness under natural and adversarial perturbations remains a critical concern. We present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios.
arXiv Detail & Related papers (2024-06-13T17:59:44Z)
Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification? [0.0]
Black-box nature of Deep Neural Networks impedes a complete understanding of their inner workings. Online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem. We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.
arXiv Detail & Related papers (2023-06-28T22:20:19Z)
Robust Models are less Over-Confident [10.42820615166362]
adversarial training (AT) aims to achieve robustness against such attacks. We empirically analyze a variety of adversarially trained models that achieve high robust accuracies. AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions.
arXiv Detail & Related papers (2022-10-12T06:14:55Z)
Robustness in Deep Learning for Computer Vision: Mind the gap? [13.576376492050185]
We identify, analyze, and summarize current definitions and progress towards non-adversarial robustness in deep learning for computer vision. We find that this area of research has received disproportionately little attention relative to adversarial machine learning.
arXiv Detail & Related papers (2021-12-01T16:42:38Z)
Harnessing Perceptual Adversarial Patches for Crowd Counting [92.79051296850405]
Crowd counting is vulnerable to adversarial examples in the physical world. This paper proposes the Perceptual Adrial Patch (PAP) generation framework to learn the shared perceptual features between models.
arXiv Detail & Related papers (2021-09-16T13:51:39Z)
Impact of Attention on Adversarial Robustness of Image Classification Models [0.9176056742068814]
Adrial attacks against deep learning models have gained significant attention. Recent works have proposed explanations for the existence of adversarial examples and techniques to defend the models against these attacks. This work aims at a general understanding of the impact of attention on adversarial robustness.
arXiv Detail & Related papers (2021-09-02T13:26:32Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI) PPI leverages proactive interventions to guard against image features with no causal relevance. We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations [71.00754846434744]
We show that imperceptible additive perturbations can significantly alter the disparity map. We show that, when used for adversarial data augmentation, our perturbations result in trained models that are more robust.
arXiv Detail & Related papers (2020-09-21T19:20:09Z)
Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model. We propose to exploit additional information from the feature space to craft stronger adversaries. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.