Fixed Point Explainability
- URL: http://arxiv.org/abs/2505.12421v2
- Date: Sat, 14 Jun 2025 10:59:13 GMT
- Title: Fixed Point Explainability
- Authors: Emanuele La Malfa, Jon Vadillo, Marco Molinari, Michael Wooldridge,
- Abstract summary: This paper introduces a formal notion of fixed point explanations, inspired by the "why regress" principle.<n>We show that fixed point explanations satisfy properties like minimality, stability, and faithfulness, revealing hidden model behaviours and explanatory weaknesses.
- Score: 3.7023628947782
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces a formal notion of fixed point explanations, inspired by the "why regress" principle, to assess, through recursive applications, the stability of the interplay between a model and its explainer. Fixed point explanations satisfy properties like minimality, stability, and faithfulness, revealing hidden model behaviours and explanatory weaknesses. We define convergence conditions for several classes of explainers, from feature-based to mechanistic tools like Sparse AutoEncoders, and we report quantitative and qualitative results.
Related papers
- On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios [46.752418052725126]
We propose a novel framework for generating probabilistic monolithic explanations and model reconciling explanations.
For monolithic explanations, our approach integrates uncertainty by utilizing probabilistic logic to increase the probability of the explanandum.
For model reconciling explanations, we propose a framework that extends the logic-based variant of the model reconciliation problem to account for probabilistic human models.
arXiv Detail & Related papers (2024-05-29T16:07:31Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - On the stability, correctness and plausibility of visual explanation
methods based on feature importance [0.0]
We study the articulation between the stability, correctness and plausibility of explanations based on feature importance for image classifiers.
We show that the existing metrics for evaluating these properties do not always agree, raising the issue of what constitutes a good evaluation metric for explanations.
arXiv Detail & Related papers (2023-10-25T08:59:21Z) - From Robustness to Explainability and Back Again [3.7950144463212134]
This paper addresses the poor scalability of formal explainability and proposes novel efficient algorithms for computing formal explanations.<n>The proposed algorithm computes explanations by answering instead a number of queries, and such robustness that the number of such queries is at most linear on the number of features.
arXiv Detail & Related papers (2023-06-05T17:21:05Z) - Towards Faithful Model Explanation in NLP: A Survey [48.690624266879155]
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand.
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction.
We review over 110 model explanation methods in NLP through the lens of faithfulness.
arXiv Detail & Related papers (2022-09-22T21:40:51Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Rethinking Stability for Attribution-based Explanations [20.215505482157255]
We introduce metrics to quantify the stability of an explanation and show that several popular explanation methods are unstable.
In particular, we propose new Relative Stability metrics that measure the change in output explanation with respect to change in input, model representation, or output of the underlying predictor.
arXiv Detail & Related papers (2022-03-14T06:19:27Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.