Related papers: PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling

PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling

URL: http://arxiv.org/abs/2510.03272v2
Date: Sun, 12 Oct 2025 14:32:47 GMT
Title: PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling
Authors: Yukun Zhang, Xueqing Zhou,
Abstract summary: We propose PDE-Transformer, a sequence modeling paradigm that casts the forward pass of a Transformer as the numerical discretization of a continuous reaction-diffusion system.<n>In our framework, token embeddings evolve under a partial differential equation whose nonlocal integral term models self-attention.<n>We design an Adaptive PDE Diffusion Layer that enforces local smoothness in feature space with linear time complexity.
Score: 4.1812935375151925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose PDE-Transformer, a novel sequence modeling paradigm that casts the forward pass of a Transformer as the numerical discretization of a continuous reaction-diffusion system derived from a variational energy functional. In our framework, token embeddings evolve under a partial differential equation whose nonlocal integral term models self-attention, local reaction term models feed-forward layers, diffusion term encodes positional smoothing, and a stability control term corresponds to layer normalization. From this unifying perspective, we design an Adaptive PDE Diffusion Layer-an efficient, learnable finite-difference stencil that enforces local smoothness in feature space with linear time complexity and complements self-attention's global routing. Through a systematic theoretical analysis based on four pillars:stability, diffusion geometry, multi-scale dynamics, and component coupling, we derive principled guidelines for integrating the PDE layer at seven candidate points in the Transformer. Empirically, on the Long Range Arena benchmark, placing the layer immediately after embedding yields a 4.1 pp average accuracy gain over a strong baseline, and an adaptive multi-scale variant delivers further improvements. Our work thus offers a principled, lightweight mechanism to bolster long-range dependency modeling by harmonizing continuous PDE smoothing with discrete self-attention.

Related papers

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z)
Flow marching for a generative PDE foundation model [0.0]
We propose Flow Marching, an algorithm that bridges neural operator learning with flow matching motivated by an analysis of error accumulation in physical dynamical systems.<n>We also introduce a Physics-Pretrained Variational Autoencoder (P2E) to embed physical trajectories into a compact latent space.<n>We curate a corpus of 2.5M trajectories across 12 distinct PDE families and train suites of P2Es and FMTs at multiple scales.
arXiv Detail & Related papers (2025-09-23T04:00:41Z)
Information-Theoretic Bounds and Task-Centric Learning Complexity for Real-World Dynamic Nonlinear Systems [0.6875312133832079]
Dynamic nonlinear systems exhibit distortions arising from coupled static and dynamic effects.<n>This paper presents a theoretical framework grounded in structured decomposition, variance analysis, and task-centric complexity bounds.
arXiv Detail & Related papers (2025-09-08T12:08:02Z)
PowerGrow: Feasible Co-Growth of Structures and Dynamics for Power Grid Synthesis [75.14189839277928]
We present PowerGrow, a co-generative framework that significantly reduces computational overhead while maintaining operational validity.<n> Experiments across benchmark settings show that PowerGrow outperforms prior diffusion models in fidelity and diversity.<n>This demonstrates its ability to generate operationally valid and realistic power grid scenarios.
arXiv Detail & Related papers (2025-08-29T01:47:27Z)
Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers [3.2266392324513267]
We propose a novel framework, Continuous_Time Attention, which infuses partial differential equations (PDEs) into the Transformer's attention mechanism.<n>We show that PDE_based attention leads to better optimization landscapes and enhances gradient flow.<n>Our findings highlight the potential of PDE_based formulations to enrich attention mechanisms with continuous_time dynamics and global coherence.
arXiv Detail & Related papers (2025-05-27T03:30:10Z)
Generative System Dynamics in Recurrent Neural Networks [56.958984970518564]
We investigate the continuous time dynamics of Recurrent Neural Networks (RNNs)<n>We show that skew-symmetric weight matrices are fundamental to enable stable limit cycles in both linear and nonlinear configurations.<n> Numerical simulations showcase how nonlinear activation functions not only maintain limit cycles, but also enhance the numerical stability of the system integration process.
arXiv Detail & Related papers (2025-04-16T10:39:43Z)
Flowing Through Layers: A Continuous Dynamical Systems Perspective on Transformers [0.0]
We show that the standard discrete update rule of transformer layers can be naturally interpreted as a forward Euler discretization of a continuous dynamical system.<n>Our Transformer Flow Approximation Theorem demonstrates that, under standard Lipschitz continuity assumptions, token representations converge uniformly to the unique solution of an ODE as the number of layers grows.
arXiv Detail & Related papers (2025-02-08T18:11:40Z)
Advancing Generalization in PINNs through Latent-Space Representations [71.86401914779019]
Physics-informed neural networks (PINNs) have made significant strides in modeling dynamical systems governed by partial differential equations (PDEs)<n>We propose PIDO, a novel physics-informed neural PDE solver designed to generalize effectively across diverse PDE configurations.<n>We validate PIDO on a range of benchmarks, including 1D combined equations and 2D Navier-Stokes equations.
arXiv Detail & Related papers (2024-11-28T13:16:20Z)
Tight Stability, Convergence, and Robustness Bounds for Predictive Coding Networks [60.3634789164648]
Energy-based learning algorithms, such as predictive coding (PC), have garnered significant attention in the machine learning community. We rigorously analyze the stability, robustness, and convergence of PC through the lens of dynamical systems theory.
arXiv Detail & Related papers (2024-10-07T02:57:26Z)
Understanding Transformer Architecture through Continuous Dynamics: A Partial Differential Equation Perspective [4.1812935375151925]
This paper introduces a novel analytical framework that reconceptualizes the Transformer's discrete, layered structure as a continuous's dynamical system governed by a master Partial Differential Equation (PDE)<n>By comparing a standard Transformer with a PDE simulator that lacks explicit stabilizers, our experiments provide compelling empirical evidence for our central thesis.<n>Our findings reveal that mathematical fundamental stabilizers are, in fact, mathematical fundamental stabilizers required to tame an otherwise powerful but inherently unstable continuous system.
arXiv Detail & Related papers (2024-08-18T16:16:57Z)
AROMA: Preserving Spatial Structure for Latent PDE Modeling with Local Neural Fields [14.219495227765671]
We present AROMA, a framework designed to enhance the modeling of partial differential equations (PDEs) using local neural fields. Our flexible encoder-decoder architecture can obtain smooth latent representations of spatial physical fields from a variety of data types. By employing a diffusion-based formulation, we achieve greater stability and enable longer rollouts compared to conventional MSE training.
arXiv Detail & Related papers (2024-06-04T10:12:09Z)
On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium. We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z)
Learning to Accelerate Partial Differential Equations via Latent Global Evolution [64.72624347511498]
Latent Evolution of PDEs (LE-PDE) is a simple, fast and scalable method to accelerate the simulation and inverse optimization of PDEs. We introduce new learning objectives to effectively learn such latent dynamics to ensure long-term stability. We demonstrate up to 128x reduction in the dimensions to update, and up to 15x improvement in speed, while achieving competitive accuracy.
arXiv Detail & Related papers (2022-06-15T17:31:24Z)
Model Reduction of Swing Equations with Physics Informed PDE [3.3263205689999444]
This manuscript is the first step towards building a robust and efficient model reduction methodology to capture transient dynamics in a transmission level electric power system. We show that, when properly coarse-grained, i.e. with the PDE coefficients and source terms extracted from a spatial convolution procedure of the respective discrete coefficients in the swing equations, the resulting PDE reproduces faithfully and efficiently the original swing dynamics.
arXiv Detail & Related papers (2021-10-26T22:46:20Z)
Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application. We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z)
Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems [74.80320120264459]
We present an approach to learn such motions from a limited number of human demonstrations. The complex motions are encoded as rollouts of a stable dynamical system. The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system.
arXiv Detail & Related papers (2020-05-27T03:51:57Z)
On dissipative symplectic integration with applications to gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically. We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.