When Stability meets Sufficiency: Informative Explanations that do not Overwhelm
- URL: http://arxiv.org/abs/2109.06181v2
- Date: Thu, 05 Dec 2024 13:50:59 GMT
- Title: When Stability meets Sufficiency: Informative Explanations that do not Overwhelm
- Authors: Ronny Luss, Amit Dhurandhar,
- Abstract summary: We consider features-based attribution methods that highlight what should be minimally sufficient to justify the classification of an input.<n>While minimal sufficiency is an attractive property akin to comprehensibility, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model.<n>We propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of stable and sufficient explanations for a given input.
- Score: 15.897648942908747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies evaluating various criteria for explainable artificial intelligence (XAI) suggest that fidelity, stability, and comprehensibility are among the most important metrics considered by users of AI across a diverse collection of usage contexts. We consider these criteria as applied to feature-based attribution methods, which are amongst the most prevalent in XAI literature. Going beyond standard correlation, methods have been proposed that highlight what should be minimally sufficient to justify the classification of an input (viz. pertinent positives). While minimal sufficiency is an attractive property akin to comprehensibility, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model. To overcome these limitations, we incorporate the criteria of stability and fidelity and propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of stable and sufficient explanations for a given input of strictly decreasing size (or value) -- from original input to a minimally sufficient explanation -- which can be thought to trace the local boundary of the model in a stable manner, thus providing better intuition about the local model behavior for the specific input. We validate these claims, both qualitatively and quantitatively, with experiments that show the benefit of PSEM across three modalities (image, tabular and text) as well as versus other path explanations. A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
Related papers
- Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability [1.8274323268621635]
Real Explainer (RealExp) is an interpretability method that decouples the Shapley Value into individual feature importance and feature correlation importance.
RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions.
arXiv Detail & Related papers (2024-12-02T10:50:50Z) - MASALA: Model-Agnostic Surrogate Explanations by Locality Adaptation [3.587367153279351]
Existing local Explainable AI (XAI) methods select a region of the input space in the vicinity of a given input instance, for which they approximate the behaviour of a model using a simpler and more interpretable surrogate model.
We propose a novel method, MASALA, for generating explanations, which automatically determines the appropriate local region of impactful model behaviour for each individual instance being explained.
arXiv Detail & Related papers (2024-08-19T15:26:45Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Towards a Unified Framework for Evaluating Explanations [0.6138671548064356]
We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models.
We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
arXiv Detail & Related papers (2024-05-22T21:49:28Z) - Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models [50.15455336684986]
We evaluate the effectiveness of LogProbs and basic prompting to measure semantic plausibility.
We find that LogProbs offers a more reliable measure of semantic plausibility than direct zero-shot prompting.
We conclude that, even in the era of prompt-based evaluations, LogProbs constitute a useful metric of semantic plausibility.
arXiv Detail & Related papers (2024-03-21T22:08:44Z) - On the stability, correctness and plausibility of visual explanation
methods based on feature importance [0.0]
We study the articulation between the stability, correctness and plausibility of explanations based on feature importance for image classifiers.
We show that the existing metrics for evaluating these properties do not always agree, raising the issue of what constitutes a good evaluation metric for explanations.
arXiv Detail & Related papers (2023-10-25T08:59:21Z) - Understanding prompt engineering may not require rethinking
generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z) - Sampling Based On Natural Image Statistics Improves Local Surrogate
Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction.
We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Locally Invariant Explanations: Towards Stable and Unidirectional
Explanations through Local Invariant Learning [15.886405745163234]
We propose a model agnostic local explanation method inspired by the invariant risk minimization principle.
Our algorithm is simple and efficient to train, and can ascertain stable input features for local decisions of a black-box without access to side information.
arXiv Detail & Related papers (2022-01-28T14:29:25Z) - A Survey on the Robustness of Feature Importance and Counterfactual
Explanations [12.599872913953238]
We present a survey of the works that analysed the robustness of two classes of local explanations.
The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results.
arXiv Detail & Related papers (2021-10-30T22:48:04Z) - Logic Constraints to Feature Importances [17.234442722611803]
"Black box" nature of AI models is often a limit for a reliable application in high-stakes fields like diagnostic techniques, autonomous guide, etc.
Recent works have shown that an adequate level of interpretability could enforce the more general concept of model trustworthiness.
The basic idea of this paper is to exploit the human prior knowledge of the features' importance for a specific task, in order to coherently aid the phase of the model's fitting.
arXiv Detail & Related papers (2021-10-13T09:28:38Z) - Evaluation of Local Model-Agnostic Explanations Using Ground Truth [4.278336455989584]
Explanation techniques are commonly evaluated using human-grounded methods.
We propose a functionally-grounded evaluation procedure for local model-agnostic explanation techniques.
arXiv Detail & Related papers (2021-06-04T13:47:31Z) - Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z) - Building Reliable Explanations of Unreliable Neural Networks: Locally
Smoothing Perspective of Model Interpretation [0.0]
We present a novel method for reliably explaining the predictions of neural networks.
Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
arXiv Detail & Related papers (2021-03-26T08:52:11Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - What Do You See? Evaluation of Explainable Artificial Intelligence (XAI)
Interpretability through Neural Backdoors [15.211935029680879]
EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs.
Current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation.
We propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations.
arXiv Detail & Related papers (2020-09-22T15:53:19Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.