Related papers: Fighter: Unveiling the Graph Convolutional Nature of Transformers in Time Series Modeling

Fighter: Unveiling the Graph Convolutional Nature of Transformers in Time Series Modeling

URL: http://arxiv.org/abs/2510.17106v1
Date: Mon, 20 Oct 2025 02:42:14 GMT
Title: Fighter: Unveiling the Graph Convolutional Nature of Transformers in Time Series Modeling
Authors: Chen Zhang, Weixin Bu, Wendong Xu, Runsheng Yu, Yik-Chung Wu, Ngai Wong,
Abstract summary: This work demystifies the Transformer encoder by establishing its fundamental equivalence to a Graph Convolutional Network (GCN)<n>We propose textbfFighter (Flexible Graph Convolutional Transformer), a streamlined architecture that removes redundant linear projections and incorporates multi-hop graph aggregation.
Score: 33.595964789473065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have achieved remarkable success in time series modeling, yet their internal mechanisms remain opaque. This work demystifies the Transformer encoder by establishing its fundamental equivalence to a Graph Convolutional Network (GCN). We show that in the forward pass, the attention distribution matrix serves as a dynamic adjacency matrix, and its composition with subsequent transformations performs computations analogous to graph convolution. Moreover, we demonstrate that in the backward pass, the update dynamics of value and feed-forward projections mirror those of GCN parameters. Building on this unified theoretical reinterpretation, we propose \textbf{Fighter} (Flexible Graph Convolutional Transformer), a streamlined architecture that removes redundant linear projections and incorporates multi-hop graph aggregation. This perspective yields an explicit and interpretable representation of temporal dependencies across different scales, naturally expressed as graph edges. Experiments on standard forecasting benchmarks confirm that Fighter achieves competitive performance while providing clearer mechanistic interpretability of its predictions.

Related papers

DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
Global-Lens Transformers: Adaptive Token Mixing for Dynamic Link Prediction [9.234363752442915]
We propose GLFormer, a novel attention-free Transformer-style framework for dynamic graphs.<n>Experiments on six widely-used dynamic graph benchmarks show that GLFormer achieves SOTA performance, which reveals that attention-free architectures can match or surpass Transformer baselines.
arXiv Detail & Related papers (2025-11-16T04:05:56Z)
Platonic Transformers: A Solid Choice For Equivariance [25.29042615187704]
We introduce the Platonic Transformer to resolve this trade-off.<n>By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme.<n>We show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters.
arXiv Detail & Related papers (2025-10-03T20:51:25Z)
Flowing Through Layers: A Continuous Dynamical Systems Perspective on Transformers [0.0]
We show that the standard discrete update rule of transformer layers can be naturally interpreted as a forward Euler discretization of a continuous dynamical system.<n>Our Transformer Flow Approximation Theorem demonstrates that, under standard Lipschitz continuity assumptions, token representations converge uniformly to the unique solution of an ODE as the number of layers grows.
arXiv Detail & Related papers (2025-02-08T18:11:40Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization [6.799413002613627]
Todyformer is a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers. We show that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks.
arXiv Detail & Related papers (2024-02-02T23:05:30Z)
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions. The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z)
Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z)
B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers [97.75725574963197]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. We show that a sequence of such transformations induces a single linear transformation that faithfully summarises the full model computations. We show that the resulting explanations are of high visual quality and perform well under quantitative interpretability metrics.
arXiv Detail & Related papers (2023-06-19T12:54:28Z)
Relphormer: Relational Graph Transformer for Knowledge Graph Representations [25.40961076988176]
We propose a new variant of Transformer for knowledge graph representations dubbed Relphormer. We propose a novel structure-enhanced self-attention mechanism to encode the relational information and keep the semantic information within entities and relations. Experimental results on six datasets show that Relphormer can obtain better performance compared with baselines.
arXiv Detail & Related papers (2022-05-22T15:30:18Z)
Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.