Related papers: Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations

Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations

URL: http://arxiv.org/abs/2504.08919v1
Date: Fri, 11 Apr 2025 19:00:12 GMT
Title: Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations
Authors: Zhen Tan, Song Wang, Yifan Li, Yu Kong, Jundong Li, Tianlong Chen, Huan Liu,
Abstract summary: Post-hoc explanation methods provide interpretation by attributing predictions to input features.<n>Do these explanations unintentionally reverse the natural relationship between inputs and outputs?<n>We propose Inversion Quantification (IQ), a framework that quantifies the degree to which explanations rely on outputs and deviate from faithful input-output relationships.
Score: 87.68633031231924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-hoc explanation methods provide interpretation by attributing predictions to input features. Natural explanations are expected to interpret how the inputs lead to the predictions. Thus, a fundamental question arises: Do these explanations unintentionally reverse the natural relationship between inputs and outputs? Specifically, are the explanations rationalizing predictions from the output rather than reflecting the true decision process? To investigate such explanatory inversion, we propose Inversion Quantification (IQ), a framework that quantifies the degree to which explanations rely on outputs and deviate from faithful input-output relationships. Using the framework, we demonstrate on synthetic datasets that widely used methods such as LIME and SHAP are prone to such inversion, particularly in the presence of spurious correlations, across tabular, image, and text domains. Finally, we propose Reproduce-by-Poking (RBP), a simple and model-agnostic enhancement to post-hoc explanation methods that integrates forward perturbation checks. We further show that under the IQ framework, RBP theoretically guarantees the mitigation of explanatory inversion. Empirically, for example, on the synthesized data, RBP can reduce the inversion by 1.8% on average across iconic post-hoc explanation approaches and domains.

Related papers

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Uncertainty Quantification for Gradient-based Explanations in Neural Networks [6.9060054915724]
We propose a pipeline to ascertain the explanation uncertainty of neural networks. We use this pipeline to produce explanation distributions for the CIFAR-10, FER+, and California Housing datasets. We compute modified pixel insertion/deletion metrics to evaluate the quality of the generated explanations.
arXiv Detail & Related papers (2024-03-25T21:56:02Z)
Explaining Predictive Uncertainty by Exposing Second-Order Effects [13.83164409095901]
We present a new method for explaining predictive uncertainty based on second-order effects. Our method is generally applicable, allowing for turning common attribution techniques into powerful second-order uncertainty explainers.
arXiv Detail & Related papers (2024-01-30T21:02:21Z)
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations. We propose a new back translation-inspired evaluation methodology. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z)
Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations [118.0818807474809]
Abductive reasoning aims to find plausible explanations for an event. Existing approaches for abductive reasoning in natural language processing often rely on manually generated annotations for supervision. This work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context.
arXiv Detail & Related papers (2023-05-24T01:35:10Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Argumentative Explanations for Pattern-Based Text Classifiers [15.81939090849456]
We focus on explanations for a specific interpretable model, namely pattern-based logistic regression (PLR) for binary text classification. We propose AXPLR, a novel explanation method using (forms of) computational argumentation to generate explanations.
arXiv Detail & Related papers (2022-05-22T21:16:49Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process. We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.