Understanding Post-hoc Explainers: The Case of Anchors
- URL: http://arxiv.org/abs/2303.08806v1
- Date: Wed, 15 Mar 2023 17:56:34 GMT
- Title: Understanding Post-hoc Explainers: The Case of Anchors
- Authors: Gianluigi Lopardo, Frederic Precioso, Damien Garreau
- Abstract summary: We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision.
After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
- Score: 6.681943980068051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many scenarios, the interpretability of machine learning models is a
highly required but difficult task. To explain the individual predictions of
such models, local model-agnostic approaches have been proposed. However, the
process generating the explanations can be, for a user, as mysterious as the
prediction to be explained. Furthermore, interpretability methods frequently
lack theoretical guarantees, and their behavior on simple models is frequently
unknown. While it is difficult, if not impossible, to ensure that an explainer
behaves as expected on a cutting-edge model, we can at least ensure that
everything works on simple, already interpretable models. In this paper, we
present a theoretical analysis of Anchors (Ribeiro et al., 2018): a popular
rule-based interpretability method that highlights a small set of words to
explain a text classifier's decision. After formalizing its algorithm and
providing useful insights, we demonstrate mathematically that Anchors produces
meaningful results when used with linear text classifiers on top of a TF-IDF
vectorization. We believe that our analysis framework can aid in the
development of new explainability methods based on solid theoretical
foundations.
Related papers
- Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Even-if Explanations: Formal Foundations, Priorities and Complexity [18.126159829450028]
We show that both linear and tree-based models are strictly more interpretable than neural networks.
We introduce a preference-based framework that enables users to personalize explanations based on their preferences.
arXiv Detail & Related papers (2024-01-17T11:38:58Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - Local Interpretable Model Agnostic Shap Explanations for machine
learning models [0.0]
We propose a methodology that we define as Local Interpretable Model Agnostic Shap Explanations (LIMASE)
This proposed technique uses Shapley values under the LIME paradigm to achieve the following (a) explain prediction of any model by using a locally faithful and interpretable decision tree model on which the Tree Explainer is used to calculate the shapley values and give visually interpretable explanations.
arXiv Detail & Related papers (2022-10-10T10:07:27Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Evaluating Explanations for Reading Comprehension with Realistic
Counterfactuals [26.641834518599303]
We propose a methodology to evaluate explanations for machine reading comprehension tasks.
An explanation should allow us to understand the RC model's high-level behavior with respect to a set of realistic counterfactual input scenarios.
Our analysis suggests that pairwise explanation techniques are better suited to RC than token-level attributions.
arXiv Detail & Related papers (2021-04-09T17:55:21Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Have We Learned to Explain?: How Interpretability Methods Can Learn to
Encode Predictions in their Interpretations [20.441578071446212]
We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method.
We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.
arXiv Detail & Related papers (2021-03-02T17:42:33Z) - Towards Interpretable Natural Language Understanding with Explanations
as Latent Variables [146.83882632854485]
We develop a framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model.
arXiv Detail & Related papers (2020-10-24T02:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.