Related papers: XAI for Transformers: Better Explanations through Conservative Propagation

XAI for Transformers: Better Explanations through Conservative Propagation

URL: http://arxiv.org/abs/2202.07304v1
Date: Tue, 15 Feb 2022 10:47:11 GMT
Title: XAI for Transformers: Better Explanations through Conservative Propagation
Authors: Ameen Ali, Thomas Schnake, Oliver Eberle, Gr\'egoire Montavon, Klaus-Robert M\"uller, Lior Wolf
Abstract summary: We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
Score: 60.67748036747221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have become an important workhorse of machine learning, with numerous applications. This necessitates the development of reliable methods for increasing their transparency. Multiple interpretability methods, often based on gradient information, have been proposed. We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. Our proposal, which can be seen as a proper extension of the well-established LRP method to Transformers, is shown both theoretically and empirically to overcome the deficiency of a simple gradient-based approach, and achieves state-of-the-art explanation performance on a broad range of Transformer models and datasets.

Related papers

Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability [53.21677928601684]
Layer-wise relevance propagation is one of the most promising approaches to explainability in deep learning.<n>We propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods.<n>Our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks.
arXiv Detail & Related papers (2025-06-02T18:07:55Z)
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape. This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z)
Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model. Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z)
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression [19.64743851296488]
In this study, we consider a sparse linear regression problem and investigate how a trained multi-head transformer performs in-context learning. We experimentally discover that the utilization of multi-heads exhibits different patterns across layers. We demonstrate that such a preprocess-then-optimize algorithm can significantly outperform naive gradient descent and ridge regression algorithms.
arXiv Detail & Related papers (2024-08-08T15:33:02Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models. SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z)
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers [14.147646140595649]
Large Language Models are prone to biased predictions and hallucinations. achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge.
arXiv Detail & Related papers (2024-02-08T12:01:24Z)
ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers [7.725095281624494]
We evaluate the effectiveness of Masked Autoencoding as a pretraining scheme, and explore Momentum Contrast as an alternative. We observe that the transformer learns to attend to semantically meaningful regions, indicating that pretraining leads to a better understanding of the underlying geometry.
arXiv Detail & Related papers (2023-06-19T09:38:21Z)
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance. Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z)
Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse [11.486545294602697]
We shed new light on the causes and effects of rank collapse in Transformers. We show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish.
arXiv Detail & Related papers (2022-06-07T09:07:24Z)
Transformers from an Optimization Perspective [24.78739299952529]
We study the problem of finding an energy function underlying the Transformer model. By finding such a function, we can reinterpret Transformers as the unfolding of an interpretable optimization process. This work contributes to our intuition and understanding of Transformers, while potentially laying the ground-work for new model designs.
arXiv Detail & Related papers (2022-05-27T10:45:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.