Interpretation of NLP models through input marginalization
- URL: http://arxiv.org/abs/2010.13984v1
- Date: Tue, 27 Oct 2020 01:40:41 GMT
- Title: Interpretation of NLP models through input marginalization
- Authors: Siwon Kim, Jihun Yi, Eunji Kim, and Sungroh Yoon
- Abstract summary: Several methods have been proposed to interpret predictions by measuring the change in prediction probability after erasing each token of an input.
Since existing methods replace each token with a predefined value (i.e., zero), the resulting sentence lies out of the training data distribution, yielding misleading interpretations.
In this study, we raise the out-of-distribution problem induced by the existing interpretation methods and present a remedy.
We interpret various NLP models trained for sentiment analysis and natural language inference using the proposed method.
- Score: 28.031961925541466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To demystify the "black box" property of deep neural networks for natural
language processing (NLP), several methods have been proposed to interpret
their predictions by measuring the change in prediction probability after
erasing each token of an input. Since existing methods replace each token with
a predefined value (i.e., zero), the resulting sentence lies out of the
training data distribution, yielding misleading interpretations. In this study,
we raise the out-of-distribution problem induced by the existing interpretation
methods and present a remedy; we propose to marginalize each token out. We
interpret various NLP models trained for sentiment analysis and natural
language inference using the proposed method.
Related papers
- LLM Generated Distribution-Based Prediction of US Electoral Results, Part I [0.0]
This paper introduces distribution-based prediction, a novel approach to using Large Language Models (LLMs) as predictive tools.
We demonstrate the use of distribution-based prediction in the context of recent United States presidential election.
arXiv Detail & Related papers (2024-11-05T20:10:25Z) - Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - Uncertainty Quantification via Stable Distribution Propagation [60.065272548502]
We propose a new approach for propagating stable probability distributions through neural networks.
Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity.
arXiv Detail & Related papers (2024-02-13T09:40:19Z) - Modeling Uncertainty in Personalized Emotion Prediction with Normalizing
Flows [6.32047610997385]
This work proposes a novel approach to capture the uncertainty of the forecast using conditional Normalizing Flows.
We validated our method on three challenging, subjective NLP tasks, including emotion recognition and hate speech.
The information brought by the developed methods makes it possible to build hybrid models whose effectiveness surpasses classic solutions.
arXiv Detail & Related papers (2023-12-10T23:21:41Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Interpreting Deep Learning Models in Natural Language Processing: A
Review [33.80537635077772]
A long-standing criticism against neural network models is the lack of interpretability.
In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP.
arXiv Detail & Related papers (2021-10-20T10:17:04Z) - Evaluating Saliency Methods for Neural Language Models [9.309351023703018]
Saliency methods are widely used to interpret neural network predictions.
Different variants of saliency methods disagree even on the interpretations of the same prediction made by the same model.
We conduct a comprehensive and quantitative evaluation of saliency methods on a fundamental category of NLP models: neural language models.
arXiv Detail & Related papers (2021-04-12T21:19:48Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z) - Considering Likelihood in NLP Classification Explanations with Occlusion
and Language Modeling [11.594541142399223]
Occlusion is a well established method that provides explanations on discrete language data.
We argue that current Occlusion-based methods often produce invalid or syntactically incorrect language data.
We propose OLM: a novel explanation method that combines Occlusion and language models to sample valid and syntactically correct replacements.
arXiv Detail & Related papers (2020-04-21T10:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.