Have We Learned to Explain?: How Interpretability Methods Can Learn to
Encode Predictions in their Interpretations
- URL: http://arxiv.org/abs/2103.01890v1
- Date: Tue, 2 Mar 2021 17:42:33 GMT
- Title: Have We Learned to Explain?: How Interpretability Methods Can Learn to
Encode Predictions in their Interpretations
- Authors: Neil Jethani, Mukund Sudarshan, Yindalon Aphinyanaphongs, Rajesh
Ranganath
- Abstract summary: We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method.
We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.
- Score: 20.441578071446212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the need for interpretable machine learning has been established, many
common approaches are slow, lack fidelity, or hard to evaluate. Amortized
explanation methods reduce the cost of providing interpretations by learning a
global selector model that returns feature importances for a single instance of
data. The selector model is trained to optimize the fidelity of the
interpretations, as evaluated by a predictor model for the target. Popular
methods learn the selector and predictor model in concert, which we show allows
predictions to be encoded within interpretations. We introduce EVAL-X as a
method to quantitatively evaluate interpretations and REAL-X as an amortized
explanation method, which learn a predictor model that approximates the true
data generating distribution given any subset of the input. We show EVAL-X can
detect when predictions are encoded in interpretations and show the advantages
of REAL-X through quantitative and radiologist evaluation.
Related papers
- XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - A Lightweight Generative Model for Interpretable Subject-level Prediction [0.07989135005592125]
We propose a technique for single-subject prediction that is inherently interpretable.
Experiments demonstrate that the resulting model can be efficiently inverted to make accurate subject-level predictions.
arXiv Detail & Related papers (2023-06-19T18:20:29Z) - Understanding Post-hoc Explainers: The Case of Anchors [6.681943980068051]
We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision.
After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
arXiv Detail & Related papers (2023-03-15T17:56:34Z) - Personalized Interpretable Classification [8.806213269230057]
We make a first step towards formally introducing personalized interpretable classification as a new data mining problem.
We conduct a series of empirical studies on real data sets.
Our algorithm can achieve the same-level predictive accuracy as those state-of-the-art (SOTA) interpretable classifiers.
arXiv Detail & Related papers (2023-02-06T01:59:16Z) - VCNet: A self-explaining model for realistic counterfactual generation [52.77024349608834]
Counterfactual explanation is a class of methods to make local explanations of machine learning decisions.
We present VCNet-Variational Counter Net, a model architecture that combines a predictor and a counterfactual generator.
We show that VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem.
arXiv Detail & Related papers (2022-12-21T08:45:32Z) - Explainability as statistical inference [29.74336283497203]
We propose a general deep probabilistic model designed to produce interpretable predictions.
The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture.
We show experimentally that using multiple imputation provides more reasonable interpretations.
arXiv Detail & Related papers (2022-12-06T16:55:10Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation.
Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission.
Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.