Training Transformers for Mesh-Based Simulations
- URL: http://arxiv.org/abs/2508.18051v1
- Date: Mon, 25 Aug 2025 14:10:13 GMT
- Title: Training Transformers for Mesh-Based Simulations
- Authors: Paul Garnier, Vincent Lannelongue, Jonathan Viquerat, Elie Hachem,
- Abstract summary: We propose a novel Graph Transformer architecture that leverages the adjacency matrix as an attention mask.<n>We train over 60 models to find a scaling law between training FLOPs and parameters.<n>The introduced models demonstrate remarkable scalability, performing on meshes with up to 300k nodes and 3 million edges.
- Score: 3.4998703934432682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Simulating physics using Graph Neural Networks (GNNs) is predominantly driven by message-passing architectures, which face challenges in scaling and efficiency, particularly in handling large, complex meshes. These architectures have inspired numerous enhancements, including multigrid approaches and $K$-hop aggregation (using neighbours of distance $K$), yet they often introduce significant complexity and suffer from limited in-depth investigations. In response to these challenges, we propose a novel Graph Transformer architecture that leverages the adjacency matrix as an attention mask. The proposed approach incorporates innovative augmentations, including Dilated Sliding Windows and Global Attention, to extend receptive fields without sacrificing computational efficiency. Through extensive experimentation, we evaluate model size, adjacency matrix augmentations, positional encoding and $K$-hop configurations using challenging 3D computational fluid dynamics (CFD) datasets. We also train over 60 models to find a scaling law between training FLOPs and parameters. The introduced models demonstrate remarkable scalability, performing on meshes with up to 300k nodes and 3 million edges. Notably, the smallest model achieves parity with MeshGraphNet while being $7\times$ faster and $6\times$ smaller. The largest model surpasses the previous state-of-the-art by $38.8$\% on average and outperforms MeshGraphNet by $52$\% on the all-rollout RMSE, while having a similar training speed. Code and datasets are available at https://github.com/DonsetPG/graph-physics.
Related papers
- Scalable Graph Generative Modeling via Substructure Sequences [50.32639806800683]
We introduce Generative Graph Pattern Machine (G$2$PM), a generative Transformer pre-training framework for graphs.<n>G$2$PM represents graph instances (nodes, edges, or entire graphs) as sequences of substructures.<n>It employs generative pre-training over the sequences to learn generalizable and transferable representations.
arXiv Detail & Related papers (2025-05-22T02:16:34Z) - Fused3S: Fast Sparse Attention on Tensor Cores [3.6068301267188]
This paper introduces Fused3S, the first fused 3S algorithm that jointly maximizes tensor core utilization and minimizes data movement.<n>Across real-world graph datasets, Fused3S $1.6- 16.3times$ and $1.5-14times$ speedup over state-of-the-art on H100 and A30 GPU.
arXiv Detail & Related papers (2025-05-12T22:09:05Z) - MeshMask: Physics-Based Simulations with Masked Graph Neural Networks [0.0]
We introduce a novel masked pre-training technique for graph neural networks (GNNs) applied to computational fluid dynamics (CFD) problems.<n>By randomly masking up to 40% of input mesh nodes during pre-training, we force the model to learn robust representations of complex fluid dynamics.<n>The proposed method achieves state-of-the-art results on seven CFD datasets, including a new challenging dataset of 3D intracranial aneurysm simulations with over 250,000 nodes per mesh.
arXiv Detail & Related papers (2025-01-15T11:34:56Z) - MeshXL: Neural Coordinate Field for Generative 3D Foundation Models [51.1972329762843]
We present a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches.
MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.
arXiv Detail & Related papers (2024-05-31T14:35:35Z) - E(3)-Equivariant Mesh Neural Networks [16.158762988735322]
Triangular meshes are widely used to represent three-dimensional objects.
Many recent works have address the need for geometric deep learning on 3D mesh.
We extend the equations of E(n)-Equivariant Graph Neural Networks (EGNNs) to incorporate mesh face information.
The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks.
arXiv Detail & Related papers (2024-02-07T13:21:41Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Scientific Computing Algorithms to Learn Enhanced Scalable Surrogates
for Mesh Physics [6.360914973656273]
MeshGraphNets (MGN) is a subclass of GNNs for mesh-based physics modeling.
We train MGN on meshes with textitmillions of nodes to generate computational fluid dynamics simulations.
This work presents a practical path to scaling MGN for real-world applications.
arXiv Detail & Related papers (2023-04-01T15:42:18Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.