Related papers: Mechanistic Interpretability for Transformer-based Time Series Classification

Mechanistic Interpretability for Transformer-based Time Series Classification

URL: http://arxiv.org/abs/2511.21514v1
Date: Wed, 26 Nov 2025 15:46:29 GMT
Title: Mechanistic Interpretability for Transformer-based Time Series Classification
Authors: Matīss Kalnāre, Sofoklis Kitharidis, Thomas Bäck, Niki van Stein,
Abstract summary: Transformer-based models have become state-of-the-art tools in various machine learning tasks, including time series classification.<n>Existing explainability methods often focus on input-output attributions, leaving the internal mechanisms largely opaque.<n>This paper addresses this gap by adapting various Mechanistic Interpretability techniques; activation patching, attention saliency, and sparse autoencoders.
Score: 1.8682641481190012
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Transformer-based models have become state-of-the-art tools in various machine learning tasks, including time series classification, yet their complexity makes understanding their internal decision-making challenging. Existing explainability methods often focus on input-output attributions, leaving the internal mechanisms largely opaque. This paper addresses this gap by adapting various Mechanistic Interpretability techniques; activation patching, attention saliency, and sparse autoencoders, from NLP to transformer architectures designed explicitly for time series classification. We systematically probe the internal causal roles of individual attention heads and timesteps, revealing causal structures within these models. Through experimentation on a benchmark time series dataset, we construct causal graphs illustrating how information propagates internally, highlighting key attention heads and temporal positions driving correct classifications. Additionally, we demonstrate the potential of sparse autoencoders for uncovering interpretable latent features. Our findings provide both methodological contributions to transformer interpretability and novel insights into the functional mechanics underlying transformer performance in time series classification tasks.

Related papers

Provable In-Context Learning of Nonlinear Regression with Transformers [66.99048542127768]
In-context learning (ICL) is the ability to perform unseen tasks using task specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z)
The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models [3.541570601342306]
This paper studies the emergence of interpretable categorical features within large language models (LLMs)<n>Using sparse autoencoders for mechanistic interpretability, we identify when and where specific semantic concepts emerge within neural activations.
arXiv Detail & Related papers (2025-05-26T02:59:54Z)
Enhancing Time Series Forecasting with Fuzzy Attention-Integrated Transformers [4.580983642743026]
FANTF (Fuzzy Attention Network-Based Transformers) is a novel approach that integrates fuzzy logic with existing transformer architectures.<n>The framework combines fuzzy-enhanced attention with a set of benchmark existing transformer-based architectures to provide efficient predictions, classification and anomaly detection.<n> Experimental evaluatios on some real-world datasets reveal that FANTF significantly enhances the performance of forecasting, classification, and anomaly detection tasks.
arXiv Detail & Related papers (2025-03-31T17:33:50Z)
VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification [47.92529531621406]
We propose a novel method, VSFormer, that incorporates both discriminative patterns (shape) and numerical information (value)<n>In addition, we extract class-specific prior information derived from supervised information to enrich the positional encoding.<n>Extensive experiments on all 30 UEA archived datasets demonstrate the superior performance of our method compared to SOTA models.
arXiv Detail & Related papers (2024-12-21T07:31:22Z)
Interpreting Affine Recurrence Learning in GPT-style Transformers [54.01174470722201]
In-context learning allows GPT-style transformers to generalize during inference without modifying their weights. This paper focuses specifically on their ability to learn and predict affine recurrences as an ICL task. We analyze the model's internal operations using both empirical and theoretical approaches.
arXiv Detail & Related papers (2024-10-22T21:30:01Z)
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.<n>We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Correlated Attention in Transformers for Multivariate Time Series [22.542109523780333]
We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
arXiv Detail & Related papers (2023-11-20T17:35:44Z)
Diagnostic Spatio-temporal Transformer with Faithful Encoding [54.02712048973161]
This paper addresses the task of anomaly diagnosis when the underlying data generation process has a complex-temporal (ST) dependency. We formalize the problem as supervised dependency discovery, where the ST dependency is learned as a side product of time-series classification. We show that temporal positional encoding used in existing ST transformer works has a serious limitation capturing frequencies in higher frequencies (short time scales) We also propose a new ST dependency discovery framework, which can provide readily consumable diagnostic information in both spatial and temporal directions.
arXiv Detail & Related papers (2023-05-26T05:31:23Z)
Generating Sparse Counterfactual Explanations For Multivariate Time Series [0.5161531917413706]
We propose a generative adversarial network (GAN) architecture that generates SPARse Counterfactual Explanations for multivariate time series. Our approach provides a custom sparsity layer and regularizes the counterfactual loss function in terms of similarity, sparsity, and smoothness of trajectories. We evaluate our approach on real-world human motion datasets as well as a synthetic time series interpretability benchmark.
arXiv Detail & Related papers (2022-06-02T08:47:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.