Self-Explaining Structures Improve NLP Models
- URL: http://arxiv.org/abs/2012.01786v2
- Date: Wed, 9 Dec 2020 02:08:38 GMT
- Title: Self-Explaining Structures Improve NLP Models
- Authors: Zijun Sun, Chun Fan, Qinghong Han, Xiaofei Sun, Yuxian Meng, Fei Wu
and Jiwei Li
- Abstract summary: We propose a simple yet general and effective self-explaining framework for deep learning models in NLP.
We show that interpretability does not come at the cost of performance: a neural model of self-explaining features obtains better performances than its counterpart without the self-explaining nature.
- Score: 25.292847674586614
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing approaches to explaining deep learning models in NLP usually suffer
from two major drawbacks: (1) the main model and the explaining model are
decoupled: an additional probing or surrogate model is used to interpret an
existing model, and thus existing explaining tools are not self-explainable;
(2) the probing model is only able to explain a model's predictions by
operating on low-level features by computing saliency scores for individual
words but are clumsy at high-level text units such as phrases, sentences, or
paragraphs. To deal with these two issues, in this paper, we propose a simple
yet general and effective self-explaining framework for deep learning models in
NLP. The key point of the proposed framework is to put an additional layer, as
is called by the interpretation layer, on top of any existing NLP model. This
layer aggregates the information for each text span, which is then associated
with a specific weight, and their weighted combination is fed to the softmax
function for the final prediction. The proposed model comes with the following
merits: (1) span weights make the model self-explainable and do not require an
additional probing model for interpretation; (2) the proposed model is general
and can be adapted to any existing deep learning structures in NLP; (3) the
weight associated with each text span provides direct importance scores for
higher-level text units such as phrases and sentences. We for the first time
show that interpretability does not come at the cost of performance: a neural
model of self-explaining features obtains better performances than its
counterpart without the self-explaining nature, achieving a new SOTA
performance of 59.1 on SST-5 and a new SOTA performance of 92.3 on SNLI.
Related papers
- Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
We propose a complete framework for extending concept-based interpretability methods to NLP.
We optimize for features whose existence causes the output predictions to change substantially.
Our method achieves superior results on predictive impact, usability, and faithfulness compared to the baselines.
arXiv Detail & Related papers (2023-05-03T14:48:27Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Hierarchical Interpretation of Neural Text Classification [31.95426448656938]
This paper proposes a novel Hierarchical INTerpretable neural text classifier, called Hint, which can automatically generate explanations of model predictions.
Experimental results on both review datasets and news datasets show that our proposed approach achieves text classification results on par with existing state-of-the-art text classifiers.
arXiv Detail & Related papers (2022-02-20T11:15:03Z) - Unsupervised Pre-training with Structured Knowledge for Improving
Natural Language Inference [22.648536283569747]
We propose models that leverage structured knowledge in different components of pre-trained models.
Our results show that the proposed models perform better than previous BERT-based state-of-the-art models.
arXiv Detail & Related papers (2021-09-08T21:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.