Related papers: A Survey on the Robustness of Feature Importance and Counterfactual Explanations

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

URL: http://arxiv.org/abs/2111.00358v1
Date: Sat, 30 Oct 2021 22:48:04 GMT
Title: A Survey on the Robustness of Feature Importance and Counterfactual Explanations
Authors: Saumitra Mishra, Sanghamitra Dutta, Jason Long, Daniele Magazzeni
Abstract summary: We present a survey of the works that analysed the robustness of two classes of local explanations. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results.
Score: 12.599872913953238
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There exist several methods that aim to address the crucial task of understanding the behaviour of AI/ML models. Arguably, the most popular among them are local explanations that focus on investigating model behaviour for individual instances. Several methods have been proposed for local analysis, but relatively lesser effort has gone into understanding if the explanations are robust and accurately reflect the behaviour of underlying models. In this work, we present a survey of the works that analysed the robustness of two classes of local explanations (feature importance and counterfactual explanations) that are popularly used in analysing AI/ML models in finance. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results. Finally, the survey introduces some pointers about extending current robustness analysis approaches so as to identify reliable explainability methods.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection [0.0]
We study the use of overly complex and opaque ML models, unaccounted data imbalances and correlated features, inconsistent influential features across different explanation methods, and the implausible utility of explanations. Specifically, we advise avoiding complex opaque models such as Deep Neural Networks and instead using interpretable ML models such as Decision Trees. We find that feature-based model explanations are most often inconsistent across different settings.
arXiv Detail & Related papers (2024-07-04T15:35:42Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
Towards a Unified Framework for Evaluating Explanations [0.6138671548064356]
We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
arXiv Detail & Related papers (2024-05-22T21:49:28Z)
Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development. To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps. These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z)
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations. We propose a new back translation-inspired evaluation methodology. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z)
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI) To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z)
Towards Faithful Model Explanation in NLP: A Survey [48.690624266879155]
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction. We review over 110 model explanation methods in NLP through the lens of faithfulness.
arXiv Detail & Related papers (2022-09-22T21:40:51Z)
ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models. Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them. We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z)
When Stability meets Sufficiency: Informative Explanations that do not Overwhelm [15.897648942908747]
We consider features-based attribution methods that highlight what should be minimally sufficient to justify the classification of an input. While minimal sufficiency is an attractive property akin to comprehensibility, the resulting explanations are often too sparse for a human to understand and evaluate the local behavior of the model. We propose a novel method called Path-Sufficient Explanations Method (PSEM) that outputs a sequence of stable and sufficient explanations for a given input.
arXiv Detail & Related papers (2021-09-13T16:06:10Z)
Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance. We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process. Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z)
Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis. We obtain new explanations that are loosely necessary and sufficient for a prediction. We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.