Learning to Faithfully Rationalize by Construction
- URL: http://arxiv.org/abs/2005.00115v1
- Date: Thu, 30 Apr 2020 21:45:40 GMT
- Title: Learning to Faithfully Rationalize by Construction
- Authors: Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron C. Wallace
- Abstract summary: In many settings it is important to be able to understand why a model made a particular prediction.
We propose a simpler variant of this approach that provides faithful explanations by construction.
In both automatic and manual evaluations we find that variants of this simple framework yield superior to end-to-end' approaches.
- Score: 36.572594249534866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many settings it is important for one to be able to understand why a model
made a particular prediction. In NLP this often entails extracting snippets of
an input text `responsible for' corresponding model output; when such a snippet
comprises tokens that indeed informed the model's prediction, it is a faithful
explanation. In some settings, faithfulness may be critical to ensure
transparency. Lei et al. (2016) proposed a model to produce faithful rationales
for neural text classification by defining independent snippet extraction and
prediction modules. However, the discrete selection over input tokens performed
by this method complicates training, leading to high variance and requiring
careful hyperparameter tuning. We propose a simpler variant of this approach
that provides faithful explanations by construction. In our scheme, named
FRESH, arbitrary feature importance scores (e.g., gradients from a trained
model) are used to induce binary labels over token inputs, which an extractor
can be trained to predict. An independent classifier module is then trained
exclusively on snippets provided by the extractor; these snippets thus
constitute faithful explanations, even if the classifier is arbitrarily
complex. In both automatic and manual evaluations we find that variants of this
simple framework yield predictive performance superior to `end-to-end'
approaches, while being more general and easier to train. Code is available at
https://github.com/successar/FRESH
Related papers
- Plausible Extractive Rationalization through Semi-Supervised Entailment
Signal [33.35604728012685]
We take a semi-supervised approach to optimize for the plausibility of extracted rationales.
We adopt a pre-trained natural language inference (NLI) model and further fine-tune it on a small set of supervised rationales.
We show that, by enforcing the alignment agreement between the explanation and answer in a question-answering task, the performance can be improved without access to ground truth labels.
arXiv Detail & Related papers (2024-02-13T14:12:32Z) - You Only Forward Once: Prediction and Rationalization in A Single
Forward Pass [10.998983921416533]
Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any rationale.
Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm.
We propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions.
arXiv Detail & Related papers (2023-11-04T08:04:28Z) - An Attribution Method for Siamese Encoders [2.1163800956183776]
This paper derives a local attribution method for Siamese encoders by generalizing the principle of integrated gradients to models with multiple inputs.
A pilot study shows that in an ST few token-pairs can often explain large fractions of predictions, and it focuses on nouns and verbs.
arXiv Detail & Related papers (2023-10-09T13:24:44Z) - Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z) - VCNet: A self-explaining model for realistic counterfactual generation [52.77024349608834]
Counterfactual explanation is a class of methods to make local explanations of machine learning decisions.
We present VCNet-Variational Counter Net, a model architecture that combines a predictor and a counterfactual generator.
We show that VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem.
arXiv Detail & Related papers (2022-12-21T08:45:32Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Explaining Reject Options of Learning Vector Quantization Classifiers [6.125017875330933]
We propose to use counterfactual explanations for explaining rejects in machine learning models.
We investigate how to efficiently compute counterfactual explanations of different reject options for an important class of models.
arXiv Detail & Related papers (2022-02-15T08:16:10Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - How do Decisions Emerge across Layers in Neural Models? Interpretation
with Differentiable Masking [70.92463223410225]
DiffMask learns to mask-out subsets of the input while maintaining differentiability.
Decision to include or disregard an input token is made with a simple model based on intermediate hidden layers.
This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers.
arXiv Detail & Related papers (2020-04-30T17:36:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.