Saliency Map Verbalization: Comparing Feature Importance Representations
from Model-free and Instruction-based Methods
- URL: http://arxiv.org/abs/2210.07222v3
- Date: Wed, 7 Jun 2023 09:29:04 GMT
- Title: Saliency Map Verbalization: Comparing Feature Importance Representations
from Model-free and Instruction-based Methods
- Authors: Nils Feldhus, Leonhard Hennig, Maximilian Dustin Nasert, Christopher
Ebert, Robert Schwarzenberg, Sebastian M\"oller
- Abstract summary: Saliency maps can explain a neural model's predictions by identifying important input features.
We formalize the underexplored task of translating saliency maps into natural language.
We compare two novel methods (search-based and instruction-based verbalizations) against conventional feature importance representations.
- Score: 6.018950511093273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Saliency maps can explain a neural model's predictions by identifying
important input features. They are difficult to interpret for laypeople,
especially for instances with many features. In order to make them more
accessible, we formalize the underexplored task of translating saliency maps
into natural language and compare methods that address two key challenges of
this approach -- what and how to verbalize. In both automatic and human
evaluation setups, using token-level attributions from text classification
tasks, we compare two novel methods (search-based and instruction-based
verbalizations) against conventional feature importance representations
(heatmap visualizations and extractive rationales), measuring simulatability,
faithfulness, helpfulness and ease of understanding. Instructing GPT-3.5 to
generate saliency map verbalizations yields plausible explanations which
include associations, abstractive summarization and commonsense reasoning,
achieving by far the highest human ratings, but they are not faithfully
capturing numeric information and are inconsistent in their interpretation of
the task. In comparison, our search-based, model-free verbalization approach
efficiently completes templated verbalizations, is faithful by design, but
falls short in helpfulness and simulatability. Our results suggest that
saliency map verbalization makes feature attribution explanations more
comprehensible and less cognitively challenging to humans than conventional
representations.
Related papers
- ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Language [40.4052848203136]
Implicit language is essential for natural language processing systems to achieve precise text understanding and facilitate natural interactions with users.
This paper develops a scalar metric that quantifies the implicitness level of language without relying on external references.
ImpScore is trained using pairwise contrastive learning on a specially curated dataset comprising $112,580$ (implicit sentence, explicit sentence) pairs.
arXiv Detail & Related papers (2024-11-07T20:23:29Z) - TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models [14.367754016281934]
This paper presents TAGExplainer, the first method designed to generate natural language explanations for TAG learning.
To address the lack of annotated ground truth explanations in real-world scenarios, we propose first generating pseudo-labels that capture the model's decisions from saliency-based explanations.
The high-quality pseudo-labels are finally utilized to train an end-to-end explanation generator model.
arXiv Detail & Related papers (2024-10-20T03:55:46Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Representing visual classification as a linear combination of words [0.0]
We present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task.
By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words.
We find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training.
arXiv Detail & Related papers (2023-11-18T02:00:20Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - More Than Words: Towards Better Quality Interpretations of Text
Classifiers [16.66535643383862]
We show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations.
We show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher level.
arXiv Detail & Related papers (2021-12-23T10:18:50Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Generating Hierarchical Explanations on Text Classification via Feature
Interaction Detection [21.02924712220406]
We build hierarchical explanations by detecting feature interactions.
Such explanations visualize how words and phrases are combined at different levels of the hierarchy.
Experiments show the effectiveness of the proposed method in providing explanations both faithful to models and interpretable to humans.
arXiv Detail & Related papers (2020-04-04T20:56:37Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.