Sparse Visual Counterfactual Explanations in Image Space
- URL: http://arxiv.org/abs/2205.07972v1
- Date: Mon, 16 May 2022 20:23:11 GMT
- Title: Sparse Visual Counterfactual Explanations in Image Space
- Authors: Valentyn Boreiko, Maximilian Augustin, Francesco Croce, Philipp
Berens, Matthias Hein
- Abstract summary: We present a novel model for visual counterfactual explanations in image space.
We show that it can be used to detect undesired behavior of ImageNet classifiers due to spurious features in the ImageNet dataset.
- Score: 50.768119964318494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual counterfactual explanations (VCEs) in image space are an important
tool to understand decisions of image classifiers as they show under which
changes of the image the decision of the classifier would change. Their
generation in image space is challenging and requires robust models due to the
problem of adversarial examples. Existing techniques to generate VCEs in image
space suffer from spurious changes in the background. Our novel perturbation
model for VCEs together with its efficient optimization via our novel
Auto-Frank-Wolfe scheme yields sparse VCEs which are significantly more
object-centric. Moreover, we show that VCEs can be used to detect undesired
behavior of ImageNet classifiers due to spurious features in the ImageNet
dataset and discuss how estimates of the data-generating distribution can be
used for VCEs.
Related papers
- Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations [35.458709912618176]
Deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features.
For safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently.
We address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation.
arXiv Detail & Related papers (2023-11-29T17:35:29Z) - Diffusion Visual Counterfactual Explanations [51.077318228247925]
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image.
Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts.
In this paper, we overcome this by generating Visual Diffusion Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers.
arXiv Detail & Related papers (2022-10-21T09:35:47Z) - ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial
Viewpoints [42.64942578228025]
We propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models.
By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints.
arXiv Detail & Related papers (2022-10-08T03:06:49Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length.
It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity.
Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z) - IntroVAC: Introspective Variational Classifiers for Learning
Interpretable Latent Subspaces [6.574517227976925]
IntroVAC learns interpretable latent subspaces by exploiting information from an additional label.
We show that IntroVAC is able to learn meaningful directions in the latent space enabling fine manipulation of image attributes.
arXiv Detail & Related papers (2020-08-03T10:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.