Related papers: How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations

URL: http://arxiv.org/abs/2503.00641v1
Date: Sat, 01 Mar 2025 22:25:11 GMT
Title: How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations
Authors: Siddhartha Gairola, Moritz Böhle, Francesco Locatello, Bernt Schiele,
Abstract summary: Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs)<n>In this work we bring forward empirical evidence that challenges this very notion.<n>We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
Score: 69.72654127617058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) and are inherently based on the assumption that the explanations can be applied independently of how the models were trained. Contrarily, in this work we bring forward empirical evidence that challenges this very notion. Surprisingly, we discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer (less than 10 percent of model parameters) play a crucial role, much more than the pre-training scheme itself. This is of high practical relevance: (1) as techniques for pre-training models are becoming increasingly diverse, understanding the interplay between these techniques and attribution methods is critical; (2) it sheds light on an important yet overlooked assumption of post-hoc attribution methods which can drastically impact model explanations and how they are interpreted eventually. With this finding we also present simple yet effective adjustments to the classification layers, that can significantly enhance the quality of model explanations. We validate our findings across several visual pre-training frameworks (fully-supervised, self-supervised, contrastive vision-language training) and analyse how they impact explanations for a wide range of attribution methods on a diverse set of evaluation metrics.

Related papers

Leveraging counterfactual concepts for debugging and improving CNN model performance [1.1049608786515839]
We propose to leverage counterfactual concepts aiming to enhance the performance of CNN models in image classification tasks.<n>Our proposed approach utilizes counterfactual reasoning to identify crucial filters used in the decision-making process.<n>By incorporating counterfactual explanations, we validate unseen model predictions and identify misclassifications.
arXiv Detail & Related papers (2025-01-19T15:50:33Z)
Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability [1.8274323268621635]
Real Explainer (RealExp) is an interpretability method that decouples the Shapley Value into individual feature importance and feature correlation importance.<n>RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions.
arXiv Detail & Related papers (2024-12-02T10:50:50Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Simple Control Baselines for Evaluating Transfer Learning [1.0499611180329802]
We share an evaluation standard that aims to quantify and communicate transfer learning performance. We provide an example empirical study investigating a few basic questions about self-supervised learning.
arXiv Detail & Related papers (2022-02-07T17:26:26Z)
DIVINE: Diverse Influential Training Points for Data Visualization and Model Refinement [32.045420977032926]
We propose a method to select a set of DIVerse INfluEntial (DIVINE) training points as a useful explanation of model behavior. Our method can identify unfairness-inducing training points, which can be removed to improve fairness outcomes.
arXiv Detail & Related papers (2021-07-13T10:50:58Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Reflective-Net: Learning from Explanations [2.6879708041086796]
We find that combining explanations with traditional labeled data leads to significant improvements in classification accuracy and training efficiency.<n>During training, we not only used explanations for the correct or predicted class, but also for other classes.
arXiv Detail & Related papers (2020-11-27T20:40:45Z)
A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques. We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
Explainable Recommender Systems via Resolving Learning Representations [57.24565012731325]
Explanations could help improve user experience and discover system defects. We propose a novel explainable recommendation model through improving the transparency of the representation learning process.
arXiv Detail & Related papers (2020-08-21T05:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.