Pair the Dots: Jointly Examining Training History and Test Stimuli for
Model Interpretability
- URL: http://arxiv.org/abs/2010.06943v2
- Date: Sat, 31 Oct 2020 01:58:47 GMT
- Title: Pair the Dots: Jointly Examining Training History and Test Stimuli for
Model Interpretability
- Authors: Yuxian Meng, Chun Fan, Zijun Sun, Eduard Hovy, Fei Wu and Jiwei Li
- Abstract summary: Any prediction from a model is made by a combination of learning history and test stimuli.
Existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history.
We propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli.
- Score: 44.60486560836836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Any prediction from a model is made by a combination of learning history and
test stimuli. This provides significant insights for improving model
interpretability: {\it because of which part(s) of which training example(s),
the model attends to which part(s) of a test example}. Unfortunately, existing
methods to interpret a model's predictions are only able to capture a single
aspect of either test stimuli or learning history, and evidences from both are
never combined or integrated. In this paper, we propose an efficient and
differentiable approach to make it feasible to interpret a model's prediction
by jointly examining training history and test stimuli. Test stimuli is first
identified by gradient-based methods, signifying {\it the part of a test
example that the model attends to}. The gradient-based saliency scores are then
propagated to training examples using influence functions to identify {\it
which part(s) of which training example(s)} make the model attends to the test
stimuli. The system is differentiable and time efficient: the adoption of
saliency scores from gradient-based methods allows us to efficiently trace a
model's prediction through test stimuli, and then back to training examples
through influence functions. We demonstrate that the proposed methodology
offers clear explanations about neural model decisions, along with being useful
for performing error analysis, crafting adversarial examples and fixing
erroneously classified examples.
Related papers
- Revealing Model Biases: Assessing Deep Neural Networks via Recovered
Sample Analysis [9.05607520128194]
This paper proposes a straightforward and cost-effective approach to assess whether a deep neural network (DNN) relies on the primary concepts of training samples.
The proposed method does not require any test or generalization samples, only the parameters of the trained model and the training data that lie on the margin.
arXiv Detail & Related papers (2023-06-10T11:20:04Z) - Guide the Learner: Controlling Product of Experts Debiasing Method Based
on Token Attribution Similarities [17.082695183953486]
A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model.
Here, the underlying assumption is that the biased model resorts to shortcut features.
We introduce a fine-tuning strategy that incorporates the similarity between the main and biased model attribution scores in a Product of Experts loss function.
arXiv Detail & Related papers (2023-02-06T15:21:41Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing.
Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z) - Empowering Language Understanding with Counterfactual Reasoning [141.48592718583245]
We propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples.
In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples.
arXiv Detail & Related papers (2021-06-06T06:36:52Z) - Building Reliable Explanations of Unreliable Neural Networks: Locally
Smoothing Perspective of Model Interpretation [0.0]
We present a novel method for reliably explaining the predictions of neural networks.
Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
arXiv Detail & Related papers (2021-03-26T08:52:11Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - An interpretable neural network model through piecewise linear
approximation [7.196650216279683]
We propose a hybrid interpretable model that combines a piecewise linear component and a nonlinear component.
The first component describes the explicit feature contributions by piecewise linear approximation to increase the expressiveness of the model.
The other component uses a multi-layer perceptron to capture feature interactions and implicit nonlinearity, and increase the prediction performance.
arXiv Detail & Related papers (2020-01-20T14:32:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.