Related papers: Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers

Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers

URL: http://arxiv.org/abs/2501.06831v1
Date: Sun, 12 Jan 2025 14:54:02 GMT
Title: Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers
Authors: Syed Ali Tariq, Tehseen Zia, Mubeen Ghafoor,
Abstract summary: We propose a novel method for generating interpretable counterfactual and contrastive explanations for DCNN models.<n>The proposed method is model intrusive that probes the internal workings of a DCNN instead of altering the input image.<n>One of the interesting applications of this method is misclassification analysis, where we compare the identified concepts from a particular input image and compare them with class-specific concepts to establish the validity of the model's decisions.
Score: 0.9831489366502298
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Explainability of deep convolutional neural networks (DCNNs) is an important research topic that tries to uncover the reasons behind a DCNN model's decisions and improve their understanding and reliability in high-risk environments. In this regard, we propose a novel method for generating interpretable counterfactual and contrastive explanations for DCNN models. The proposed method is model intrusive that probes the internal workings of a DCNN instead of altering the input image to generate explanations. Given an input image, we provide contrastive explanations by identifying the most important filters in the DCNN representing features and concepts that separate the model's decision between classifying the image to the original inferred class or some other specified alter class. On the other hand, we provide counterfactual explanations by specifying the minimal changes necessary in such filters so that a contrastive output is obtained. Using these identified filters and concepts, our method can provide contrastive and counterfactual reasons behind a model's decisions and makes the model more transparent. One of the interesting applications of this method is misclassification analysis, where we compare the identified concepts from a particular input image and compare them with class-specific concepts to establish the validity of the model's decisions. The proposed method is compared with state-of-the-art and evaluated on the Caltech-UCSD Birds (CUB) 2011 dataset to show the usefulness of the explanations provided.

Related papers

From Visual Explanations to Counterfactual Explanations with Latent Diffusion [11.433402357922414]
We propose a new approach to tackle two key challenges in recent prominent works. First, we determine which specific counterfactual features are crucial for distinguishing the "concept" of the target class from the original class. Second, we provide valuable explanations for the non-robust classifier without relying on the support of an adversarially robust model.
arXiv Detail & Related papers (2025-04-12T13:04:00Z)
A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation. Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity. This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
P-TAME: Explain Any Image Classifier with Trained Perturbations [14.31574090533474]
P-TAME (Perturbation-based Trainable Attention Mechanism for Explanations) is a model-agnostic method for explaining Deep Neural Networks (DNNs) It generates high-resolution explanations in a single forward pass during inference. We apply P-TAME to explain the decisions of VGG-16, ResNet-50, and ViT-B-16, three distinct and widely used image classifiers.
arXiv Detail & Related papers (2025-01-29T18:06:08Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations [35.458709912618176]
Deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features. For safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently. We address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation.
arXiv Detail & Related papers (2023-11-29T17:35:29Z)
Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness. We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z)
Diffusion Visual Counterfactual Explanations [51.077318228247925]
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts. In this paper, we overcome this by generating Visual Diffusion Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers.
arXiv Detail & Related papers (2022-10-21T09:35:47Z)
ADVISE: ADaptive Feature Relevance and VISual Explanations for Convolutional Neural Networks [0.745554610293091]
We introduce ADVISE, a new explainability method that quantifies and leverages the relevance of each unit of the feature map to provide better visual explanations. We extensively evaluate our idea in the image classification task using AlexNet, VGG16, ResNet50, and Xception pretrained on ImageNet. Our experiments further show that ADVISE fulfils the sensitivity and implementation independence axioms while passing the sanity checks.
arXiv Detail & Related papers (2022-03-02T18:16:57Z)
Designing Counterfactual Generators using Deep Model Inversion [31.1607056675927]
We develop a deep inversion approach to generate counterfactual explanations for a given query image. We find that, in addition to producing visually meaningful explanations, the counterfactuals from DISC are effective at learning decision boundaries and are robust to unknown test-time corruptions.
arXiv Detail & Related papers (2021-09-29T08:40:50Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
Explainable Image Classification with Evidence Counterfactual [0.0]
We introduce SEDC as a model-agnostic instance-level explanation method for image classification. For a given image, SEDC searches a small set of segments that, in case of removal, alters the classification. We compare SEDC(-T) with popular feature importance methods such as LRP, LIME and SHAP, and we describe how the mentioned importance ranking issues are addressed.
arXiv Detail & Related papers (2020-04-16T08:02:48Z)
Explainable Deep Classification Models for Domain Generalization [94.43131722655617]
Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. Our training strategy enforces a periodic saliency-based feedback to encourage the model to focus on the image regions that directly correspond to the ground-truth object.
arXiv Detail & Related papers (2020-03-13T22:22:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.