Related papers: Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention

Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention

URL: http://arxiv.org/abs/2511.06294v2
Date: Wed, 12 Nov 2025 01:38:11 GMT
Title: Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
Authors: Wenjie Hu, Sidun Liu, Peng Qiao, Zhenglun Sun, Yong Dou,
Abstract summary: We propose a two-step transformation to redesign Physics-Attention into a canonical linear attention.<n>Our method achieves state-of-the-art performance on six standard PDE benchmarks.<n>It reduces the number of parameters by an average of 40.0% and computational cost by 36.2%.
Score: 17.072389584390425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Transformer-based Neural Operators have enabled significant progress in data-driven solvers for Partial Differential Equations (PDEs). Most current research has focused on reducing the quadratic complexity of attention to address the resulting low training and inference efficiency. Among these works, Transolver stands out as a representative method that introduces Physics-Attention to reduce computational costs. Physics-Attention projects grid points into slices for slice attention, then maps them back through deslicing. However, we observe that Physics-Attention can be reformulated as a special case of linear attention, and that the slice attention may even hurt the model performance. Based on these observations, we argue that its effectiveness primarily arises from the slice and deslice operations rather than interactions between slices. Building on this insight, we propose a two-step transformation to redesign Physics-Attention into a canonical linear attention, which we call Linear Attention Neural Operator (LinearNO). Our method achieves state-of-the-art performance on six standard PDE benchmarks, while reducing the number of parameters by an average of 40.0% and computational cost by 36.2%. Additionally, it delivers superior performance on two challenging, industrial-level datasets: AirfRANS and Shape-Net Car.

Related papers

From Cheap Geometry to Expensive Physics: Elevating Neural Operators via Latent Shape Pretraining [4.838293051443775]
Industrial design evaluation often relies on simulations of governing partial differential equations (PDEs)<n> Operator learning has emerged as a promising approach to accelerate PDE solution prediction.<n>We propose a two-stage framework to better exploit this abundant, physics-agnostic resource.
arXiv Detail & Related papers (2025-09-30T05:01:31Z)
Enabling Automatic Differentiation with Mollified Graph Neural Operators [73.52999622724101]
We propose the mollified graph neural operator ($m$GNO), the first method to leverage automatic differentiation and compute exact gradients on arbitrary geometries.<n>For a PDE example on regular grids, $m$GNO paired with autograd reduced the L2 relative data error by 20x compared to finite differences.<n>It can also solve PDEs on unstructured point clouds seamlessly, using physics losses only, at resolutions vastly lower than those needed for finite differences to be accurate enough.
arXiv Detail & Related papers (2025-04-11T06:16:30Z)
BEExformer: A Fast Inferencing Binarized Transformer with Early Exits [2.7651063843287718]
We introduce Binarized Early Exit Transformer (BEExformer), the first-ever selective learning-based transformer integrating Binarization-Aware Training (BAT) with Early Exit (EE)<n>BAT employs a differentiable second-order approximation to the sign function, enabling gradient that captures both the sign and magnitude of the weights.<n>EE mechanism hinges on fractional reduction in entropy among intermediate transformer blocks with soft-routing loss estimation.<n>This accelerates inference by reducing FLOPs by 52.08% and even improves accuracy by 2.89% by resolving the "overthinking" problem inherent in deep networks.
arXiv Detail & Related papers (2024-12-06T17:58:14Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z)
Breaking the Low-Rank Dilemma of Linear Attention [61.55583836370135]
Linear attention provides a far more efficient solution by reducing the complexity to linear levels.<n>Our experiments indicate that this performance drop is due to the low-rank nature of linear attention's feature map.<n>We introduce Rank-Augmented Linear Attention (RALA), which rivals the performance of Softmax attention while maintaining linear complexity and high efficiency.
arXiv Detail & Related papers (2024-11-12T08:30:59Z)
Non-asymptotic Convergence of Training Transformers for Next-token Prediction [48.9399496805422]
Transformers have achieved extraordinary success in modern machine learning due to their excellent ability to handle sequential data. This paper provides a fine-grained non-asymptotic analysis of the training dynamics of a one-layer transformer. We show that the trained transformer presents non-token prediction ability with dataset shift.
arXiv Detail & Related papers (2024-09-25T20:22:06Z)
Latent Neural Operator for Solving Forward and Inverse PDE Problems [5.8039987932401225]
We present the Latent Neural Operator (LNO) solving PDEs in the latent space.<n> Experiments show that LNO reduces the GPU memory by 50%, speeds up training 1.8 times, and reaches state-of-the-art accuracy on four out of six benchmarks.
arXiv Detail & Related papers (2024-06-06T10:04:53Z)
Physics-Informed Deep Learning of Rate-and-State Fault Friction [0.0]
We develop a multi-network PINN for both the forward problem and for direct inversion of nonlinear fault friction parameters. We present the computational PINN framework for strike-slip faults in 1D and 2D subject to rate-and-state friction. We find that the network for the parameter inversion at the fault performs much better than the network for material displacements to which it is coupled.
arXiv Detail & Related papers (2023-12-14T23:53:25Z)
FLatten Transformer: Vision Transformer using Focused Linear Attention [80.61335173752146]
Linear attention offers a much more efficient alternative with its linear complexity. Current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead. We propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness.
arXiv Detail & Related papers (2023-08-01T10:37:12Z)
Learning Physics-Informed Neural Networks without Stacked Back-propagation [82.26566759276105]
We develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks. In particular, we parameterize the PDE solution by the Gaussian smoothed model and show that, derived from Stein's Identity, the second-order derivatives can be efficiently calculated without back-propagation. Experimental results show that our proposed method can achieve competitive error compared to standard PINN training but is two orders of magnitude faster.
arXiv Detail & Related papers (2022-02-18T18:07:54Z)
Choose a Transformer: Fourier or Galerkin [0.0]
We apply the self-attention from the state-of-the-art Transformer in Attention Is All You Need to a data-driven operator learning problem. We show that softmax normalization in the scaled dot-product attention is sufficient but not necessary, and have proved the approximation capacity of a linear variant as a Petrov-Galerkin projection. We present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem.
arXiv Detail & Related papers (2021-05-31T14:30:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.