PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
- URL: http://arxiv.org/abs/2505.24717v1
- Date: Fri, 30 May 2025 15:39:54 GMT
- Title: PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
- Authors: Benjamin Holzschuh, Qiang Liu, Georg Kohl, Nils Thuerey,
- Abstract summary: We introduce PDE-Transformer, an improved transformer-based architecture for surrogate modeling of physics simulations on regular grids.<n>We demonstrate that our proposed architecture outperforms state-of-the-art transformer architectures for computer vision on a large dataset of 16 different types of PDEs.
- Score: 23.196500975208302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce PDE-Transformer, an improved transformer-based architecture for surrogate modeling of physics simulations on regular grids. We combine recent architectural improvements of diffusion transformers with adjustments specific for large-scale simulations to yield a more scalable and versatile general-purpose transformer architecture, which can be used as the backbone for building large-scale foundation models in physical sciences. We demonstrate that our proposed architecture outperforms state-of-the-art transformer architectures for computer vision on a large dataset of 16 different types of PDEs. We propose to embed different physical channels individually as spatio-temporal tokens, which interact via channel-wise self-attention. This helps to maintain a consistent information density of tokens when learning multiple types of PDEs simultaneously. We demonstrate that our pre-trained models achieve improved performance on several challenging downstream tasks compared to training from scratch and also beat other foundation model architectures for physics simulations.
Related papers
- The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting [26.76928230531243]
Transformer-based models have recently become dominant in Long-term Time Series Forecasting (LTSF)<n> variations in their architecture, such as encoder-only, encoder-decoder, and decoder-only designs, raise a crucial question: What Transformer architecture works best for LTSF tasks?<n>Existing models are often tightly coupled with various time-series-specific designs, making it difficult to isolate the impact of the architecture itself.<n>We propose a novel taxonomy that disentangles these designs, enabling clearer and more unified comparisons of Transformer architectures.
arXiv Detail & Related papers (2025-07-17T12:16:04Z) - Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning [30.781578037476347]
We introduce a novel approach to modeling transformer architectures using highly flexible non-autonomous neural ordinary differential equations (ODEs)<n>Our proposed model parameterizes all weights of attention and feed-forward blocks through neural networks, expressing these weights as functions of a continuous layer index.<n>Our neural ODE transformer demonstrates performance comparable to or better than vanilla transformers across various configurations and datasets.
arXiv Detail & Related papers (2025-03-03T09:12:14Z) - Knowledge-enhanced Transformer for Multivariate Long Sequence Time-series Forecasting [4.645182684813973]
We introduce a novel approach that encapsulates conceptual relationships among variables within a well-defined knowledge graph.
We investigate the influence of this integration into seminal architectures such as PatchTST, Autoformer, Informer, and Vanilla Transformer.
This enhancement empowers transformer-based architectures to address the inherent structural relation between variables.
arXiv Detail & Related papers (2024-11-17T11:53:54Z) - Exploring the design space of deep-learning-based weather forecasting systems [56.129148006412855]
This paper systematically analyzes the impact of different design choices on deep-learning-based weather forecasting systems.
We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models.
We propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures.
arXiv Detail & Related papers (2024-10-09T22:25:50Z) - Comprehensive Performance Modeling and System Design Insights for Foundation Models [1.4455936781559149]
Generative AI, in particular large transformer models, are increasingly driving HPC system design in science and industry.
We analyze performance characteristics of such transformer models and discuss their sensitivity to the transformer type, parallelization strategy, and HPC system features.
Our analysis emphasizes the need for closer performance modeling of different transformer types keeping system features in mind.
arXiv Detail & Related papers (2024-09-30T22:56:42Z) - Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models [92.36510016591782]
We present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs)<n>Our method, called MOHAWK, is able to distill a Mamba-2 variant based on the Phi-1.5 architecture using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens.<n>Despite using less than 1% of the training data typically used to train models from scratch, Phi-Mamba boasts substantially stronger performance compared to all past open-source non-Transformer models.
arXiv Detail & Related papers (2024-08-19T17:48:11Z) - A Unified Framework for Interpretable Transformers Using PDEs and Information Theory [3.4039202831583903]
This paper presents a novel unified theoretical framework for understanding Transformer architectures by integrating Partial Differential Equations (PDEs), Neural Information Flow Theory, and Information Bottleneck Theory.
We model Transformer information dynamics as a continuous PDE process, encompassing diffusion, self-attention, and nonlinear residual components.
Our comprehensive experiments across image and text modalities demonstrate that the PDE model effectively captures key aspects of Transformer behavior, achieving high similarity (cosine similarity > 0.98) with Transformer attention distributions across all layers.
arXiv Detail & Related papers (2024-08-18T16:16:57Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Energy Transformer [64.22957136952725]
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory.
We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
arXiv Detail & Related papers (2023-02-14T18:51:22Z) - Foundation Transformers [105.06915886136524]
We call for the development of Foundation Transformer for true general-purpose modeling.
In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal.
arXiv Detail & Related papers (2022-10-12T17:16:27Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z) - Machine learning for rapid discovery of laminar flow channel wall
modifications that enhance heat transfer [56.34005280792013]
We present a combination of accurate numerical simulations of arbitrary, flat, and non-flat channels and machine learning models predicting drag coefficient and Stanton number.
We show that convolutional neural networks (CNN) can accurately predict the target properties at a fraction of the time of numerical simulations.
arXiv Detail & Related papers (2021-01-19T16:14:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.