Related papers: Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability

Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability

URL: http://arxiv.org/abs/2010.06943v2
Date: Sat, 31 Oct 2020 01:58:47 GMT
Title: Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability
Authors: Yuxian Meng, Chun Fan, Zijun Sun, Eduard Hovy, Fei Wu and Jiwei Li
Abstract summary: Any prediction from a model is made by a combination of learning history and test stimuli. Existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history. We propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli.
Score: 44.60486560836836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Any prediction from a model is made by a combination of learning history and test stimuli. This provides significant insights for improving model interpretability: {\it because of which part(s) of which training example(s), the model attends to which part(s) of a test example}. Unfortunately, existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history, and evidences from both are never combined or integrated. In this paper, we propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli. Test stimuli is first identified by gradient-based methods, signifying {\it the part of a test example that the model attends to}. The gradient-based saliency scores are then propagated to training examples using influence functions to identify {\it which part(s) of which training example(s)} make the model attends to the test stimuli. The system is differentiable and time efficient: the adoption of saliency scores from gradient-based methods allows us to efficiently trace a model's prediction through test stimuli, and then back to training examples through influence functions. We demonstrate that the proposed methodology offers clear explanations about neural model decisions, along with being useful for performing error analysis, crafting adversarial examples and fixing erroneously classified examples.

Related papers

Revealing Model Biases: Assessing Deep Neural Networks via Recovered Sample Analysis [9.05607520128194]
This paper proposes a straightforward and cost-effective approach to assess whether a deep neural network (DNN) relies on the primary concepts of training samples. The proposed method does not require any test or generalization samples, only the parameters of the trained model and the training data that lie on the margin.
arXiv Detail & Related papers (2023-06-10T11:20:04Z)
Guide the Learner: Controlling Product of Experts Debiasing Method Based on Token Attribution Similarities [17.082695183953486]
A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model. Here, the underlying assumption is that the biased model resorts to shortcut features. We introduce a fine-tuning strategy that incorporates the similarity between the main and biased model attribution scores in a Product of Experts loss function.
arXiv Detail & Related papers (2023-02-06T15:21:41Z)
Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels. Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features. These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z)
Empowering Language Understanding with Counterfactual Reasoning [141.48592718583245]
We propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples. In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples.
arXiv Detail & Related papers (2021-06-06T06:36:52Z)
Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation [0.0]
We present a novel method for reliably explaining the predictions of neural networks. Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
arXiv Detail & Related papers (2021-03-26T08:52:11Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
An interpretable neural network model through piecewise linear approximation [7.196650216279683]
We propose a hybrid interpretable model that combines a piecewise linear component and a nonlinear component. The first component describes the explicit feature contributions by piecewise linear approximation to increase the expressiveness of the model. The other component uses a multi-layer perceptron to capture feature interactions and implicit nonlinearity, and increase the prediction performance.
arXiv Detail & Related papers (2020-01-20T14:32:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.