Related papers: Towards Better Model Understanding with Path-Sufficient Explanations

Towards Better Model Understanding with Path-Sufficient Explanations

URL: http://arxiv.org/abs/2109.06181v1
Date: Mon, 13 Sep 2021 16:06:10 GMT
Title: Towards Better Model Understanding with Path-Sufficient Explanations
Authors: Ronny Luss, Amit Dhurandhar
Abstract summary: Path-Sufficient Explanations Method (PSEM) is a sequence of sufficient explanations for a given input of strictly decreasing size. PSEM can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input. A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
Score: 11.517059323883444
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature based local attribution methods are amongst the most prevalent in explainable artificial intelligence (XAI) literature. Going beyond standard correlation, recently, methods have been proposed that highlight what should be minimally sufficient to justify the classification of an input (viz. pertinent positives). While minimal sufficiency is an attractive property, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model, thus making it difficult to judge its overall quality. To overcome these limitations, we propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of sufficient explanations for a given input of strictly decreasing size (or value) -- from original input to a minimally sufficient explanation -- which can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input. We validate these claims, both qualitatively and quantitatively, with experiments that show the benefit of PSEM across all three modalities (image, tabular and text). A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.

Related papers

Assessing reliability of explanations in unbalanced datasets: a use-case on the occurrence of frost events [39.981755859250505]
In this study, we provide some preliminary insights on evaluating the reliability of explanations in the specific case of unbalanced datasets.<n>We propose a simple evaluation focused on the minority class that leverages on-manifold generation of neighbours, explanation aggregation and a metric to test explanation consistency.
arXiv Detail & Related papers (2025-07-13T09:12:38Z)
T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation [60.620408007636016]
We propose T2I-Eval-R1, a novel reinforcement learning framework that trains open-source MLLMs using only coarse-grained quality scores.<n>Our approach integrates Group Relative Policy Optimization into the instruction-tuning process, enabling models to generate both scalar scores and interpretable reasoning chains.
arXiv Detail & Related papers (2025-05-23T13:44:59Z)
Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability [1.8274323268621635]
Real Explainer (RealExp) is an interpretability method that decouples the Shapley Value into individual feature importance and feature correlation importance. RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions.
arXiv Detail & Related papers (2024-12-02T10:50:50Z)
MASALA: Model-Agnostic Surrogate Explanations by Locality Adaptation [3.587367153279351]
Existing local Explainable AI (XAI) methods select a region of the input space in the vicinity of a given input instance, for which they approximate the behaviour of a model using a simpler and more interpretable surrogate model. We propose a novel method, MASALA, for generating explanations, which automatically determines the appropriate local region of impactful model behaviour for each individual instance being explained.
arXiv Detail & Related papers (2024-08-19T15:26:45Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
Towards a Unified Framework for Evaluating Explanations [0.6138671548064356]
We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
arXiv Detail & Related papers (2024-05-22T21:49:28Z)
Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models [50.15455336684986]
We evaluate the effectiveness of LogProbs and basic prompting to measure semantic plausibility. We find that LogProbs offers a more reliable measure of semantic plausibility than direct zero-shot prompting. We conclude that, even in the era of prompt-based evaluations, LogProbs constitute a useful metric of semantic plausibility.
arXiv Detail & Related papers (2024-03-21T22:08:44Z)
On the stability, correctness and plausibility of visual explanation methods based on feature importance [0.0]
We study the articulation between the stability, correctness and plausibility of explanations based on feature importance for image classifiers. We show that the existing metrics for evaluating these properties do not always agree, raising the issue of what constitutes a good evaluation metric for explanations.
arXiv Detail & Related papers (2023-10-25T08:59:21Z)
Understanding prompt engineering may not require rethinking generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature. This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z)
Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction. We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning [15.886405745163234]
We propose a model agnostic local explanation method inspired by the invariant risk minimization principle. Our algorithm is simple and efficient to train, and can ascertain stable input features for local decisions of a black-box without access to side information.
arXiv Detail & Related papers (2022-01-28T14:29:25Z)
A Survey on the Robustness of Feature Importance and Counterfactual Explanations [12.599872913953238]
We present a survey of the works that analysed the robustness of two classes of local explanations. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results.
arXiv Detail & Related papers (2021-10-30T22:48:04Z)
Logic Constraints to Feature Importances [17.234442722611803]
"Black box" nature of AI models is often a limit for a reliable application in high-stakes fields like diagnostic techniques, autonomous guide, etc. Recent works have shown that an adequate level of interpretability could enforce the more general concept of model trustworthiness. The basic idea of this paper is to exploit the human prior knowledge of the features' importance for a specific task, in order to coherently aid the phase of the model's fitting.
arXiv Detail & Related papers (2021-10-13T09:28:38Z)
Evaluation of Local Model-Agnostic Explanations Using Ground Truth [4.278336455989584]
Explanation techniques are commonly evaluated using human-grounded methods. We propose a functionally-grounded evaluation procedure for local model-agnostic explanation techniques.
arXiv Detail & Related papers (2021-06-04T13:47:31Z)
Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z)
Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation [0.0]
We present a novel method for reliably explaining the predictions of neural networks. Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
arXiv Detail & Related papers (2021-03-26T08:52:11Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
What Do You See? Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors [15.211935029680879]
EXplainable AI (XAI) methods have been proposed to interpret how a deep neural network predicts inputs. Current evaluation approaches either require subjective input from humans or incur high computation cost with automated evaluation. We propose backdoor trigger patterns--hidden malicious functionalities that cause misclassification--to automate the evaluation of saliency explanations.
arXiv Detail & Related papers (2020-09-22T15:53:19Z)
Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis. We obtain new explanations that are loosely necessary and sufficient for a prediction. We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.