When LRP Diverges from Leave-One-Out in Transformers
- URL: http://arxiv.org/abs/2510.18810v1
- Date: Tue, 21 Oct 2025 17:06:05 GMT
- Title: When LRP Diverges from Leave-One-Out in Transformers
- Authors: Weiqiu You, Siqi Zeng, Yao-Hung Hubert Tsai, Makoto Yamada, Han Zhao,
- Abstract summary: We show that the bilinear propagation rules used in recent advances of AttnLRP violate the implementation invariance axiom.<n>We also revisit CP-LRP as a diagnostic baseline and find that bypassing relevance propagation through the softmax layer significantly improves alignment with LOO.
- Score: 25.22101448084133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leave-One-Out (LOO) provides an intuitive measure of feature importance but is computationally prohibitive. While Layer-Wise Relevance Propagation (LRP) offers a potentially efficient alternative, its axiomatic soundness in modern Transformers remains largely under-examined. In this work, we first show that the bilinear propagation rules used in recent advances of AttnLRP violate the implementation invariance axiom. We prove this analytically and confirm it empirically in linear attention layers. Second, we also revisit CP-LRP as a diagnostic baseline and find that bypassing relevance propagation through the softmax layer -- backpropagating relevance only through the value matrices -- significantly improves alignment with LOO, particularly in middle-to-late Transformer layers. Overall, our results suggest that (i) bilinear factorization sensitivity and (ii) softmax propagation error potentially jointly undermine LRP's ability to approximate LOO in Transformers.
Related papers
- MirrorLA: Reflecting Feature Map for Vision Linear Attention [49.41670925034762]
Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance.<n>We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation.<n>MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
arXiv Detail & Related papers (2026-02-04T09:14:09Z) - Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions [32.71332125930795]
Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL)<n>We investigate Markovian function learning through a structured ICL setup to reveal underlying optimization behaviors.
arXiv Detail & Related papers (2025-10-21T13:42:48Z) - Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond [64.88201012057822]
Existing research indicates that the output of Chain-of-Thought (CoT) is significantly affected by input perturbations.<n>We theoretically analyze the effect of input perturbations on the fluctuation of CoT outputs.
arXiv Detail & Related papers (2025-09-25T15:04:31Z) - Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models [8.135786025034626]
We study the in-context learning capabilities of pretrained Transformers in the setting of nonlinear regression.<n>We show that the random Transformer behaves equivalent to a finite-degree Hermite model in terms of ICL error.
arXiv Detail & Related papers (2025-09-18T16:57:27Z) - Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability [53.21677928601684]
Layer-wise relevance propagation is one of the most promising approaches to explainability in deep learning.<n>We propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods.<n>Our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks.
arXiv Detail & Related papers (2025-06-02T18:07:55Z) - In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation [14.716318664357408]
This paper investigates approximation-theoretic aspects of the in-context learning capability of the transformers in representing a family of noisy linear dynamical systems.<n>Our first theoretical result establishes an upper bound on the approximation error of multi-layer transformers with respect to an $L2$-testing loss uniformly defined across tasks.<n>Our second result establishes a non-diminishing lower bound on the approximation error for a class of single-layer linear transformers.
arXiv Detail & Related papers (2025-02-12T05:40:11Z) - Interpreting Affine Recurrence Learning in GPT-style Transformers [54.01174470722201]
In-context learning allows GPT-style transformers to generalize during inference without modifying their weights.
This paper focuses specifically on their ability to learn and predict affine recurrences as an ICL task.
We analyze the model's internal operations using both empirical and theoretical approaches.
arXiv Detail & Related papers (2024-10-22T21:30:01Z) - Transformers Handle Endogeneity in In-Context Linear Regression [34.458004744956334]
We show that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV)<n>We propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss.
arXiv Detail & Related papers (2024-10-02T06:21:04Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - XAI for Transformers: Better Explanations through Conservative
Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction.
Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z) - Deriving Differential Target Propagation from Iterating Approximate
Inverses [91.3755431537592]
We show that a particular form of target propagation, relying on learned inverses of each layer, which is differential, gives rise to an update rule which corresponds to an approximate Gauss-Newton gradient-based optimization.
We consider several iterative calculations based on local auto-encoders at each layer in order to achieve more precise inversions for more accurate target propagation.
arXiv Detail & Related papers (2020-07-29T22:34:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.