Related papers: Interpreting vision transformers via residual replacement model

Interpreting vision transformers via residual replacement model

URL: http://arxiv.org/abs/2509.17401v1
Date: Mon, 22 Sep 2025 07:00:57 GMT
Title: Interpreting vision transformers via residual replacement model
Authors: Jinyeong Kim, Junhyeok Kim, Yumin Shim, Joohyeok Kim, Sunyoung Jung, Seong Jae Hwang,
Abstract summary: How do vision transformers (ViTs) represent and process the world?<n>This paper addresses this long-standing question through the first systematic analysis of 6.6K features across all layers.<n>We introduce the residual replacement model, which replaces ViT computations with interpretable features in the residual stream.
Score: 8.97847158738423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How do vision transformers (ViTs) represent and process the world? This paper addresses this long-standing question through the first systematic analysis of 6.6K features across all layers, extracted via sparse autoencoders, and by introducing the residual replacement model, which replaces ViT computations with interpretable features in the residual stream. Our analysis reveals not only a feature evolution from low-level patterns to high-level semantics, but also how ViTs encode curves and spatial positions through specialized feature types. The residual replacement model scalably produces a faithful yet parsimonious circuit for human-scale interpretability by significantly simplifying the original computations. As a result, this framework enables intuitive understanding of ViT mechanisms. Finally, we demonstrate the utility of our framework in debiasing spurious correlations.

Related papers

Vision Transformers Need More Than Registers [70.42157905484765]
Vision Transformers (ViTs) provide general-purpose representations for diverse downstream tasks.<n> artifacts in ViTs are widely observed across different supervision paradigms and downstream tasks.<n>We conclude that these artifacts originate from a lazy aggregation behavior.
arXiv Detail & Related papers (2026-02-25T20:42:35Z)
Equi-ViT: Rotational Equivariant Vision Transformer for Robust Histopathology Analysis [4.388994056961038]
We propose Equi-ViT, which integrates an equivariant convolution kernel into the patch embedding stage of a ViT architecture.<n>We show that Equi-ViT achieves superior rotation-consistent patch embeddings and stable classification performance across image orientations.
arXiv Detail & Related papers (2026-01-14T04:03:20Z)
Transformer Meets Twicing: Harnessing Unattended Residual Information [2.1605931466490795]
Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks.<n>While the self-attention mechanism has proven capable of handling complex data patterns, it has been observed that the representational capacity of the attention matrix degrades significantly across transformer layers.<n>We propose the Twicing Attention, a novel attention mechanism that uses kernel twicing procedure in nonparametric regression to alleviate the low-pass behavior of associated NLM smoothing.
arXiv Detail & Related papers (2025-03-02T01:56:35Z)
Entropy-Lens: The Information Signature of Transformer Computations [14.613982627206884]
We study the evolution of token-level distributions directly in vocabulary space.<n>We compute the Shannon entropy of each intermediate predicted distribution, yielding one interpretable scalar per layer.<n>We introduce Entropy-Lens, a model-agnostic framework that extracts entropy profiles from frozen, off-the-shelf transformers.
arXiv Detail & Related papers (2025-02-23T13:33:27Z)
ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation [14.84547724351634]
We introduce ULTra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them.<n>We propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model.<n>We validate ULTra for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization.
arXiv Detail & Related papers (2024-11-15T19:36:50Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
Denoising Vision Transformers [43.03068202384091]
We propose a two-stage denoising approach, termed Denoising Vision Transformers (DVT) In the first stage, we separate the clean features from those contaminated by positional artifacts by enforcing cross-view feature consistency with neural fields on a per-image basis. In the second stage, we train a lightweight transformer block to predict clean features from raw ViT outputs, leveraging the derived estimates of the clean features as supervision.
arXiv Detail & Related papers (2024-01-05T18:59:52Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage. We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction. Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z)
Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems. We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN) We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model. VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE. We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.