Related papers: Activation-Deactivation: A General Framework for Robust Post-hoc Explainable AI

Activation-Deactivation: A General Framework for Robust Post-hoc Explainable AI

URL: http://arxiv.org/abs/2510.01038v1
Date: Wed, 01 Oct 2025 15:42:58 GMT
Title: Activation-Deactivation: A General Framework for Robust Post-hoc Explainable AI
Authors: Akchunya Chanchal, David A. Kelly, Hana Chockler,
Abstract summary: Activation-Deactivation (AD) removes the effects of occluded input features from the model's decision-making.<n>We introduce ConvAD, a drop-in mechanism that can be easily added to any trained Convolutional Neural Network (CNN)<n>We prove that the ConvAD mechanism does not change the decision-making process of the network.
Score: 4.3331379059769395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Black-box explainability methods are popular tools for explaining the decisions of image classifiers. A major drawback of these tools is their reliance on mutants obtained by occluding parts of the input, leading to out-of-distribution images. This raises doubts about the quality of the explanations. Moreover, choosing an appropriate occlusion value often requires domain knowledge. In this paper we introduce a novel forward-pass paradigm Activation-Deactivation (AD), which removes the effects of occluded input features from the model's decision-making by switching off the parts of the model that correspond to the occlusions. We introduce ConvAD, a drop-in mechanism that can be easily added to any trained Convolutional Neural Network (CNN), and which implements the AD paradigm. This leads to more robust explanations without any additional training or fine-tuning. We prove that the ConvAD mechanism does not change the decision-making process of the network. We provide experimental evaluation across several datasets and model architectures. We compare the quality of AD-explanations with explanations achieved using a set of masking values, using the proxies of robustness, size, and confidence drop-off. We observe a consistent improvement in robustness of AD explanations (up to 62.5%) compared to explanations obtained with occlusions, demonstrating that ConvAD extracts more robust explanations without the need for domain knowledge.

Related papers

Pathwise Explanation of ReLU Neural Networks [20.848391252661074]
We introduce a novel approach that considers subsets of the hidden units involved in the decision making path.<n>This pathwise explanation provides a clearer and more consistent understanding of the relationship between the input and the decision-making process.
arXiv Detail & Related papers (2025-06-22T13:41:42Z)
Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers [0.9831489366502298]
We propose a novel method for generating interpretable counterfactual and contrastive explanations for DCNN models.<n>The proposed method is model intrusive that probes the internal workings of a DCNN instead of altering the input image.<n>One of the interesting applications of this method is misclassification analysis, where we compare the identified concepts from a particular input image and compare them with class-specific concepts to establish the validity of the model's decisions.
arXiv Detail & Related papers (2025-01-12T14:54:02Z)
Explainable Image Recognition via Enhanced Slot-attention Based Classifier [28.259040737540797]
We introduce ESCOUTER, a visually explainable classifier based on the modified slot attention mechanism. ESCOUTER distinguishes itself by not only delivering high classification accuracy but also offering more transparent insights into the reasoning behind its decisions. A novel loss function specifically for ESCOUTER is designed to fine-tune the model's behavior, enabling it to toggle between positive and negative explanations.
arXiv Detail & Related papers (2024-07-08T05:05:43Z)
Manipulating Feature Visualizations with Gradient Slingshots [53.94925202421929]
Feature Visualization (FV) is a widely used technique for interpreting the concepts learned by Deep Neural Networks (DNNs)<n>We introduce a novel method, Gradient Slingshots, that enables manipulation of FV without modifying the model architecture or significantly degrading its performance.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Towards Better Visualizing the Decision Basis of Networks via Unfold and Conquer Attribution Guidance [29.016425469068587]
We propose a novel framework, Unfold and Conquer Guidance (UCAG), which enhances the explainability of the network decision.<n>UCAG sequentially complies with the confidence of slices of the image, leading to providing an abundant and clear interpretation.<n>We conduct numerous evaluations to validate the performance in several metrics.
arXiv Detail & Related papers (2023-12-21T03:43:19Z)
Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation [79.22678026708134]
In this paper, we propose an inherently interpretable method, named Transferable Prototype Learning ( TCPL) To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.
arXiv Detail & Related papers (2023-10-12T06:36:41Z)
Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision. Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z)
ADVISE: ADaptive Feature Relevance and VISual Explanations for Convolutional Neural Networks [0.745554610293091]
We introduce ADVISE, a new explainability method that quantifies and leverages the relevance of each unit of the feature map to provide better visual explanations. We extensively evaluate our idea in the image classification task using AlexNet, VGG16, ResNet50, and Xception pretrained on ImageNet. Our experiments further show that ADVISE fulfils the sensitivity and implementation independence axioms while passing the sanity checks.
arXiv Detail & Related papers (2022-03-02T18:16:57Z)
KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA. Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation. An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Explainable Deep Classification Models for Domain Generalization [94.43131722655617]
Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision. Our training strategy enforces a periodic saliency-based feedback to encourage the model to focus on the image regions that directly correspond to the ground-truth object.
arXiv Detail & Related papers (2020-03-13T22:22:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.