Related papers: Integrating attention into explanation frameworks for language and vision transformers

Integrating attention into explanation frameworks for language and vision transformers

URL: http://arxiv.org/abs/2508.08966v1
Date: Tue, 12 Aug 2025 14:31:22 GMT
Title: Integrating attention into explanation frameworks for language and vision transformers
Authors: Marte Eggen, Jacob Lysnæs-Larsen, Inga Strümke,
Abstract summary: This work studies the potential of utilising the information encoded in attention weights to provide meaningful model explanations.<n>We develop two novel explanation methods applicable to both natural language processing and computer vision tasks.<n>Our empirical evaluations on standard benchmarks and in a comparison study with widely used explanation methods show that attention weights can be meaningfully incorporated into the studied XAI frameworks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The attention mechanism lies at the core of the transformer architecture, providing an interpretable model-internal signal that has motivated a growing interest in attention-based model explanations. Although attention weights do not directly determine model outputs, they reflect patterns of token influence that can inform and complement established explainability techniques. This work studies the potential of utilising the information encoded in attention weights to provide meaningful model explanations by integrating them into explainable AI (XAI) frameworks that target fundamentally different aspects of model behaviour. To this end, we develop two novel explanation methods applicable to both natural language processing and computer vision tasks. The first integrates attention weights into the Shapley value decomposition by redefining the characteristic function in terms of pairwise token interactions via attention weights, thus adapting this widely used game-theoretic solution concept to provide attention-driven attributions for local explanations. The second incorporates attention weights into token-level directional derivatives defined through concept activation vectors to measure concept sensitivity for global explanations. Our empirical evaluations on standard benchmarks and in a comparison study with widely used explanation methods show that attention weights can be meaningfully incorporated into the studied XAI frameworks, highlighting their value in enriching transformer explainability.

Related papers

Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns [0.0]
We introduce Explanation Network (ExpNet), a lightweight neural network that learns an explicit mapping from transformer attention patterns to token-level importance scores.<n>We evaluate ExpNet in a challenging cross-task setting and benchmark it against a broad spectrum of model-agnostic methods and attention-based techniques.
arXiv Detail & Related papers (2026-01-20T16:06:34Z)
Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs [3.429783703166407]
Our framework enables a global dissection of model behavior by analyzing how high-level semantic attributes emerge, interact, and propagate through internal model components.<n>A key innovation is our visualization platform that we named BAGEL, which presents these insights in a structured knowledge graph.<n>Our framework is model-agnostic, scalable, and contributes to a deeper understanding of how deep learning models generalize (or fail to) in the presence of dataset biases.
arXiv Detail & Related papers (2025-07-08T09:30:20Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Binding Dynamics in Rotating Features [72.80071820194273]
We propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly. This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
arXiv Detail & Related papers (2024-02-08T12:31:08Z)
Naturalness of Attention: Revisiting Attention in Code Language Models [3.756550107432323]
Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. This study aims to shed some light on the previously ignored factors of the attention mechanism beyond the attention weights.
arXiv Detail & Related papers (2023-11-22T16:34:12Z)
Semantic Interpretation and Validation of Graph Attention-based Explanations for GNN Models [9.260186030255081]
We propose a methodology for investigating the use of semantic attention to enhance the explainability of Graph Neural Network (GNN)-based models. Our work extends existing attention-based graph explainability methods by analysing the divergence in the attention distributions in relation to semantically sorted feature sets. We apply our methodology on a lidar pointcloud estimation model successfully identifying key semantic classes that contribute to enhanced performance.
arXiv Detail & Related papers (2023-08-08T12:34:32Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Guiding Visual Question Answering with Attention Priors [76.21671164766073]
We propose to guide the attention mechanism using explicit linguistic-visual grounding. This grounding is derived by connecting structured linguistic concepts in the query to their referents among the visual objects. The resultant algorithm is capable of probing attention-based reasoning models, injecting relevant associative knowledge, and regulating the core reasoning process.
arXiv Detail & Related papers (2022-05-25T09:53:47Z)
SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism. We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z)
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning Models [82.3793660091354]
This paper analyzes the predictions of image captioning models with attention mechanisms beyond visualizing the attention itself. We develop variants of layer-wise relevance propagation (LRP) and gradient-based explanation methods, tailored to image captioning models with attention mechanisms.
arXiv Detail & Related papers (2020-01-04T05:15:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.