Probing the Embedding Space of Transformers via Minimal Token Perturbations
- URL: http://arxiv.org/abs/2506.18011v1
- Date: Sun, 22 Jun 2025 12:22:56 GMT
- Title: Probing the Embedding Space of Transformers via Minimal Token Perturbations
- Authors: Eddie Conti, Alejandro Astruc, Alvaro Parafita, Axel Brando,
- Abstract summary: We study the effects of minimal token perturbations on the embedding space.<n>We also study how perturbations propagate across layers, demonstrating that input information is increasingly intermixed in deeper layers.<n>This work introduces the combination of token perturbations and shifts on the embedding space as a powerful tool for model interpretability.
- Score: 40.292373831893705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how information propagates through Transformer models is a key challenge for interpretability. In this work, we study the effects of minimal token perturbations on the embedding space. In our experiments, we analyze the frequency of which tokens yield to minimal shifts, highlighting that rare tokens usually lead to larger shifts. Moreover, we study how perturbations propagate across layers, demonstrating that input information is increasingly intermixed in deeper layers. Our findings validate the common assumption that the first layers of a model can be used as proxies for model explanations. Overall, this work introduces the combination of token perturbations and shifts on the embedding space as a powerful tool for model interpretability.
Related papers
- Attention-Only Transformers via Unrolled Subspace Denoising [19.832264029213515]
We derive a fully interpretable transformer architecture with only necessary components.<n>By unrolling such iterative denoising operations into a deep network, we arrive at a highly compact architecture.<n>Despite its simplicity, experiments on vision and language tasks demonstrate that such a transformer achieves performance close to that of standard transformer architectures.
arXiv Detail & Related papers (2025-06-04T09:53:14Z) - You Do Not Fully Utilize Transformer's Representation Capacity [4.753535328327317]
Layer-Integrated Memory (LIMe) is a lightweight extension that learns per-head, per-layer routing weights to integrate representations from all previous layers with negligible overhead.<n>LIMe consistently achieves faster convergence, lower perplexity per FLOP, and substantial accuracy improvements on synthetic tasks.
arXiv Detail & Related papers (2025-02-13T12:00:50Z) - Demystifying Singular Defects in Large Language Models [61.98878352956125]
In large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored.<n>We provide both theoretical insights and empirical validation across a range of recent models.<n>We showcase two practical applications of these findings: the improvement of quantization schemes and the design of LLM signatures.
arXiv Detail & Related papers (2025-02-10T20:09:16Z) - A Theory for Compressibility of Graph Transformers for Transductive Learning [6.298115235439078]
Transductive tasks on graphs differ fundamentally from typical supervised machine learning tasks.
All train/test/validation samples are present during training, making them more akin to a semi-supervised task.
We establish some theoretical bounds on how and under what conditions the hidden dimension of these networks can be compressed.
arXiv Detail & Related papers (2024-11-20T04:20:17Z) - A Theoretical Understanding of Shallow Vision Transformers: Learning,
Generalization, and Sample Complexity [71.11795737362459]
ViTs with self-attention modules have recently achieved great empirical success in many tasks.
However, theoretical learning generalization analysis is mostly noisy and elusive.
This paper provides the first theoretical analysis of a shallow ViT for a classification task.
arXiv Detail & Related papers (2023-02-12T22:12:35Z) - Revisiting Over-smoothing in BERT from the Perspective of Graph [111.24636158179908]
Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.
We find that layer normalization plays a key role in the over-smoothing issue of Transformer-based models.
We consider hierarchical fusion strategies, which combine the representations from different layers adaptively to make the output more diverse.
arXiv Detail & Related papers (2022-02-17T12:20:52Z) - XAI for Transformers: Better Explanations through Conservative
Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction.
Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z) - Incorporating Residual and Normalization Layers into Analysis of Masked
Language Models [29.828669678974983]
We extend the scope of the analysis of Transformers from solely the attention patterns to the whole attention block.
Our analysis of Transformer-based masked language models shows that the token-to-token interaction performed via attention has less impact on the intermediate representations than previously assumed.
arXiv Detail & Related papers (2021-09-15T08:32:20Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.