Related papers: Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

URL: http://arxiv.org/abs/2210.07222v3
Date: Wed, 7 Jun 2023 09:29:04 GMT
Title: Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods
Authors: Nils Feldhus, Leonhard Hennig, Maximilian Dustin Nasert, Christopher Ebert, Robert Schwarzenberg, Sebastian M\"oller
Abstract summary: Saliency maps can explain a neural model's predictions by identifying important input features. We formalize the underexplored task of translating saliency maps into natural language. We compare two novel methods (search-based and instruction-based verbalizations) against conventional feature importance representations.
Score: 6.018950511093273
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Saliency maps can explain a neural model's predictions by identifying important input features. They are difficult to interpret for laypeople, especially for instances with many features. In order to make them more accessible, we formalize the underexplored task of translating saliency maps into natural language and compare methods that address two key challenges of this approach -- what and how to verbalize. In both automatic and human evaluation setups, using token-level attributions from text classification tasks, we compare two novel methods (search-based and instruction-based verbalizations) against conventional feature importance representations (heatmap visualizations and extractive rationales), measuring simulatability, faithfulness, helpfulness and ease of understanding. Instructing GPT-3.5 to generate saliency map verbalizations yields plausible explanations which include associations, abstractive summarization and commonsense reasoning, achieving by far the highest human ratings, but they are not faithfully capturing numeric information and are inconsistent in their interpretation of the task. In comparison, our search-based, model-free verbalization approach efficiently completes templated verbalizations, is faithful by design, but falls short in helpfulness and simulatability. Our results suggest that saliency map verbalization makes feature attribution explanations more comprehensible and less cognitively challenging to humans than conventional representations.

Related papers

Verified Language Processing with Hybrid Explainability: A Technical Report [0.7066382982173529]
We present a novel pipeline designed for hybrid explainability to address this.<n>Our methodology combines graphs and logic to produce First-Order Logic representations, creating machine- and human-readable representations through Montague Grammar.<n>Preliminary results indicate the effectiveness of this approach in capturing full text similarity.
arXiv Detail & Related papers (2025-07-07T14:00:05Z)
Main Predicate and Their Arguments as Explanation Signals For Intent Classification [41.09752906121257]
We show that the main verb denotes the action, and the direct object indicates the domain of conversation, serving as explanation signals for intent. We mark main predicates (primarily verbs) and their arguments as explanation signals in benchmark intent classification datasets ATIS and SNIPS. We observe that models that work well for classification do not perform well in explainability metrics like plausibility and faithfulness.
arXiv Detail & Related papers (2025-02-03T11:39:26Z)
A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks [10.181408678232055]
We introduce an evaluation methodology for reading comprehension tasks based on the intuition that certain examples consistently yield lower scores regardless of model size or architecture. We capitalize on semantic frame annotation for characterizing this complexity, and study seven complexity factors that may account for model's difficulty.
arXiv Detail & Related papers (2025-01-29T11:05:20Z)
Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering [27.193336817953142]
We integrate different discrete subset sampling methods into a graph-based visual question answering system. We show that the integrated methods effectively mitigate the performance trade-off between interpretability and answer accuracy. We also conduct a human evaluation to assess the interpretability of the generated subgraphs.
arXiv Detail & Related papers (2024-12-11T10:18:37Z)
ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Language [40.4052848203136]
Implicit language is essential for natural language processing systems to achieve precise text understanding and facilitate natural interactions with users. This paper develops a scalar metric that quantifies the implicitness level of language without relying on external references. ImpScore is trained using pairwise contrastive learning on a specially curated dataset comprising $112,580$ (implicit sentence, explicit sentence) pairs.
arXiv Detail & Related papers (2024-11-07T20:23:29Z)
TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models [14.367754016281934]
This paper presents TAGExplainer, the first method designed to generate natural language explanations for TAG learning. To address the lack of annotated ground truth explanations in real-world scenarios, we propose first generating pseudo-labels that capture the model's decisions from saliency-based explanations. The high-quality pseudo-labels are finally utilized to train an end-to-end explanation generator model.
arXiv Detail & Related papers (2024-10-20T03:55:46Z)
Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process. We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous. Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z)
Representing visual classification as a linear combination of words [0.0]
We present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task. By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words. We find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training.
arXiv Detail & Related papers (2023-11-18T02:00:20Z)
Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z)
More Than Words: Towards Better Quality Interpretations of Text Classifiers [16.66535643383862]
We show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations. We show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher level.
arXiv Detail & Related papers (2021-12-23T10:18:50Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels. We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection [21.02924712220406]
We build hierarchical explanations by detecting feature interactions. Such explanations visualize how words and phrases are combined at different levels of the hierarchy. Experiments show the effectiveness of the proposed method in providing explanations both faithful to models and interpretable to humans.
arXiv Detail & Related papers (2020-04-04T20:56:37Z)
Temporal Embeddings and Transformer Models for Narrative Text Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling. The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time. A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.