Related papers: Attention vs non-attention for a Shapley-based explanation method

Attention vs non-attention for a Shapley-based explanation method

URL: http://arxiv.org/abs/2104.12424v1
Date: Mon, 26 Apr 2021 09:33:18 GMT
Title: Attention vs non-attention for a Shapley-based explanation method
Authors: Tom Kersten, Hugh Mee Wong, Jaap Jumelet, Dieuwke Hupkes
Abstract summary: We consider Contextual Decomposition (CD) -- a Shapley-based input feature attribution method that has been shown to work well for recurrent NLP models. We show that the English and Dutch models demonstrate similar processing behaviour, but that under the hood there are consistent differences between our attention and non-attention models.
Score: 6.386917828177479
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The field of explainable AI has recently seen an explosion in the number of explanation methods for highly non-linear deep neural networks. The extent to which such methods -- that are often proposed and tested in the domain of computer vision -- are appropriate to address the explainability challenges in NLP is yet relatively unexplored. In this work, we consider Contextual Decomposition (CD) -- a Shapley-based input feature attribution method that has been shown to work well for recurrent NLP models -- and we test the extent to which it is useful for models that contain attention operations. To this end, we extend CD to cover the operations necessary for attention-based models. We then compare how long distance subject-verb relationships are processed by models with and without attention, considering a number of different syntactic structures in two different languages: English and Dutch. Our experiments confirm that CD can successfully be applied for attention-based models as well, providing an alternative Shapley-based attribution method for modern neural networks. In particular, using CD, we show that the English and Dutch models demonstrate similar processing behaviour, but that under the hood there are consistent differences between our attention and non-attention models.

Related papers

Fast and Accurate Explanations of Distance-Based Classifiers by Uncovering Latent Explanatory Structures [13.600836585770134]
We contribute by uncovering a hidden neural network structure in distance-based classifiers.<n>We show the overall usefulness of explaining distance-based models through two practical use cases.
arXiv Detail & Related papers (2025-08-05T21:01:58Z)
Sparks of Explainability: Recent Advancements in Explaining Large Vision Models [6.1642231492615345]
This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. It evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices. Two hypotheses are examined: aligning models with human reasoning and adopting a conceptual explainability approach.
arXiv Detail & Related papers (2025-02-03T04:49:32Z)
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models [49.84163262868945]
Large language models have shown remarkable performance across a wide range of language tasks, owing to their exceptional capabilities in context modeling. The most commonly used method of context modeling is full self-attention, as seen in standard decoder-only Transformers. We propose parallel context encoding, which splits the context into sub-pieces and encodes them parallelly.
arXiv Detail & Related papers (2024-12-21T09:04:51Z)
DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models [6.369258625916601]
Post-hoc interpretability methods fail to capture the models' decision-making process fully. Our paper introduces DISCO, a novel method for discovering global, rule-based explanations. DISCO supports interactive explanations, enabling human inspectors to distinguish spurious causes in the rule-based output.
arXiv Detail & Related papers (2024-11-07T12:12:44Z)
A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem [38.84662767814454]
Key challenge to address under the condition of limited training data is how to fine-tune pre-trained vision-language models in a parameter-efficient manner. This paper proposes a unified computational framework to integrate existing methods together, identify their nature and support in-depth comparison. As a demonstration, we extend existing methods by modeling inter-class correlation between representers in reproducing kernel Hilbert space (RKHS)
arXiv Detail & Related papers (2024-10-15T15:22:30Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
Information Theoretic Text-to-Image Alignment [49.396917351264655]
We present a novel method that relies on an information-theoretic alignment measure to steer image generation. Our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI.
arXiv Detail & Related papers (2024-05-31T12:20:02Z)
On Understanding Attention-Based In-Context Learning for Categorical Data [49.40350941996942]
We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections.<n>This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations.
arXiv Detail & Related papers (2024-05-27T15:03:21Z)
Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability. We work out a grey box methodology, in which we train models to perfection on a formal language classification task. We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z)
On the Interpretability of Attention Networks [1.299941371793082]
We show how an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
arXiv Detail & Related papers (2022-12-30T15:31:22Z)
Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval [51.53892300802014]
We show that supervised neural information retrieval models are prone to learning sparse attention patterns over passage tokens. Using a novel targeted synthetic data generation method, we teach neural IR to attend more uniformly and robustly to all entities in a given passage.
arXiv Detail & Related papers (2022-04-24T22:36:48Z)
This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation [17.485732906337507]
We present a case study of the self-explaining network, ProtoPNet, in the presence of a spectrum of artifacts. We introduce a novel method for generating more precise model-aware explanations. In order to obtain a clean dataset, we propose to use multi-view clustering strategies for segregating the artifact images.
arXiv Detail & Related papers (2021-08-27T09:55:53Z)
Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights. We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.