Related papers: A Mechanistic Analysis of Transformers for Dynamical Systems

A Mechanistic Analysis of Transformers for Dynamical Systems

URL: http://arxiv.org/abs/2512.21113v1
Date: Wed, 24 Dec 2025 11:21:07 GMT
Title: A Mechanistic Analysis of Transformers for Dynamical Systems
Authors: Gregory Duthé, Nikolaos Evangelou, Wei Liu, Ioannis G. Kevrekidis, Eleni Chatzi,
Abstract summary: We study the representational capabilities and limitations of single-layer Transformers when applied to dynamical data.<n>For linear systems, we show that the convexity constraint imposed by softmax attention fundamentally restricts the class of dynamics that can be represented.<n>For nonlinear systems under partial observability, attention instead acts as an adaptive delay-embedding mechanism.
Score: 4.590170084532207
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers are increasingly adopted for modeling and forecasting time-series, yet their internal mechanisms remain poorly understood from a dynamical systems perspective. In contrast to classical autoregressive and state-space models, which benefit from well-established theoretical foundations, Transformer architectures are typically treated as black boxes. This gap becomes particularly relevant as attention-based models are considered for general-purpose or zero-shot forecasting across diverse dynamical regimes. In this work, we do not propose a new forecasting model, but instead investigate the representational capabilities and limitations of single-layer Transformers when applied to dynamical data. Building on a dynamical systems perspective we interpret causal self-attention as a linear, history-dependent recurrence and analyze how it processes temporal information. Through a series of linear and nonlinear case studies, we identify distinct operational regimes. For linear systems, we show that the convexity constraint imposed by softmax attention fundamentally restricts the class of dynamics that can be represented, leading to oversmoothing in oscillatory settings. For nonlinear systems under partial observability, attention instead acts as an adaptive delay-embedding mechanism, enabling effective state reconstruction when sufficient temporal context and latent dimensionality are available. These results help bridge empirical observations with classical dynamical systems theory, providing insight into when and why Transformers succeed or fail as models of dynamical systems.

Related papers

KoopGen: Koopman Generator Networks for Representing and Predicting Dynamical Systems with Continuous Spectra [65.11254608352982]
We introduce a generator-based neural Koopman framework that models dynamics through a structured, state-dependent representation of Koopman generators.<n>By exploiting the intrinsic Cartesian decomposition into skew-adjoint and self-adjoint components, KoopGen separates conservative transport from irreversible dissipation.
arXiv Detail & Related papers (2026-02-15T06:32:23Z)
Transformer Learning of Chaotic Collective Dynamics in Many-Body Systems [0.0]
We show that a self-attention-based transformer framework provides an effective approach for modeling chaotic collective dynamics.<n>We study the one-dimensional semiclassical Holstein model, where interaction quenches induce strongly nonlinear and chaotic dynamics.<n>Our results establish self-attention as a powerful mechanism for learning effective reduced dynamics in chaotic many-body systems.
arXiv Detail & Related papers (2026-01-27T01:33:33Z)
Attention Mechanisms in Dynamical Systems: A Case Study with Predator-Prey Models [0.0]
We train a simple linear attention model on time-series data to reconstruct system trajectories.<n>Remarkably, the learned attention weights align with the geometric structure of the Lyapunov function.<n>Results suggest a novel use of AI-derived attention for interpretable, data-driven analysis and control of nonlinear systems.
arXiv Detail & Related papers (2025-05-10T04:14:28Z)
Learning System Dynamics without Forgetting [60.08612207170659]
We investigate the problem of Continual Dynamics Learning (CDL), examining task configurations and evaluating the applicability of existing techniques.<n>We propose the Mode-switching Graph ODE (MS-GODE) model, which integrates the strengths LG-ODE and sub-network learning with a mode-switching module.<n>We construct a novel benchmark of biological dynamic systems for CDL, Bio-CDL, featuring diverse systems with disparate dynamics.
arXiv Detail & Related papers (2024-06-30T14:55:18Z)
eXponential FAmily Dynamical Systems (XFADS): Large-scale nonlinear Gaussian state-space modeling [9.52474299688276]
We introduce a low-rank structured variational autoencoder framework for nonlinear state-space graphical models. We show that our approach consistently demonstrates the ability to learn a more predictive generative model.
arXiv Detail & Related papers (2024-03-03T02:19:49Z)
Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective [63.60312929416228]
textbftextitAttraos incorporates chaos theory into long-term time series forecasting. We show that Attraos outperforms various LTSF methods on mainstream datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.
arXiv Detail & Related papers (2024-02-18T05:35:01Z)
Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation. We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience. Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z)
Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
Physics-guided Deep Markov Models for Learning Nonlinear Dynamical Systems with Uncertainty [6.151348127802708]
We propose a physics-guided framework, termed Physics-guided Deep Markov Model (PgDMM) The proposed framework takes advantage of the expressive power of deep learning, while retaining the driving physics of the dynamical system.
arXiv Detail & Related papers (2021-10-16T16:35:12Z)
Supervised DKRC with Images for Offline System Identification [77.34726150561087]
Modern dynamical systems are becoming increasingly non-linear and complex. There is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control. Our approach learns these basis functions using a supervised learning approach.
arXiv Detail & Related papers (2021-09-06T04:39:06Z)
Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system. We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.