Counterfactual Explanations as Plans
- URL: http://arxiv.org/abs/2502.09205v1
- Date: Thu, 13 Feb 2025 11:45:54 GMT
- Title: Counterfactual Explanations as Plans
- Authors: Vaishak Belle,
- Abstract summary: We look to provide a formal account of counterfactual explanations," based in terms of action sequences.
We then show that this naturally leads to an account of model reconciliation, which might take the form of the user correcting the agent's model, or suggesting actions to the agent's plan.
- Score: 6.445239204595516
- License:
- Abstract: There has been considerable recent interest in explainability in AI, especially with black-box machine learning models. As correctly observed by the planning community, when the application at hand is not a single-shot decision or prediction, but a sequence of actions that depend on observations, a richer notion of explanations are desirable. In this paper, we look to provide a formal account of ``counterfactual explanations," based in terms of action sequences. We then show that this naturally leads to an account of model reconciliation, which might take the form of the user correcting the agent's model, or suggesting actions to the agent's plan. For this, we will need to articulate what is true versus what is known, and we appeal to a modal fragment of the situation calculus to formalise these intuitions. We consider various settings: the agent knowing partial truths, weakened truths and having false beliefs, and show that our definitions easily generalize to these different settings.
Related papers
- ExpProof : Operationalizing Explanations for Confidential Models with ZKPs [33.47144717983562]
We take a step towards operationalizing explanations in adversarial scenarios with Zero-Knowledge Proofs (ZKPs)
Specifically we explore ZKP-amenable versions of the popular explainability algorithm LIME and evaluate their performance on Neural Networks and Random Forests.
arXiv Detail & Related papers (2025-02-06T04:24:29Z) - Limitations of Agents Simulated by Predictive Models [1.6649383443094403]
We outline two structural reasons for why predictive models can fail when turned into agents.
We show that both of those failures are fixed by including a feedback loop from the environment.
Our treatment provides a unifying view of those failure modes, and informs the question of why fine-tuning offline learned policies with online learning makes them more effective.
arXiv Detail & Related papers (2024-02-08T17:08:08Z) - Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance [4.962171160815189]
We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations.
We first explore the advantage of dissenting explanations in the setting of model multiplicity.
We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
arXiv Detail & Related papers (2023-07-14T21:27:00Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Causal Explanations and XAI [8.909115457491522]
An important goal of Explainable Artificial Intelligence (XAI) is to compensate for mismatches by offering explanations.
I take a step further by formally defining the causal notions of sufficient explanations and counterfactual explanations.
I also touch upon the significance of this work for fairness in AI by showing how actual causation can be used to improve the idea of path-specific counterfactual fairness.
arXiv Detail & Related papers (2022-01-31T12:32:10Z) - Do not explain without context: addressing the blind spot of model
explanations [2.280298858971133]
This paper highlights a blind spot which is often overlooked when monitoring and auditing machine learning models.
We discuss that many model explanations depend directly or indirectly on the choice of the referenced data distribution.
We showcase examples where small changes in the distribution lead to drastic changes in the explanations, such as a change in trend or, alarmingly, a conclusion.
arXiv Detail & Related papers (2021-05-28T12:48:40Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - What can I do here? A Theory of Affordances in Reinforcement Learning [65.70524105802156]
We develop a theory of affordances for agents who learn and plan in Markov Decision Processes.
Affordances play a dual role in this case, by reducing the number of actions available in any given situation.
We propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better.
arXiv Detail & Related papers (2020-06-26T16:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.