Related papers: Integrating Locality-Aware Attention with Transformers for General Geometry PDEs

Integrating Locality-Aware Attention with Transformers for General Geometry PDEs

URL: http://arxiv.org/abs/2504.13480v1
Date: Fri, 18 Apr 2025 05:43:49 GMT
Title: Integrating Locality-Aware Attention with Transformers for General Geometry PDEs
Authors: Minsu Koh, Beom-Chul Park, Heejo Kong, Seong-Whan Lee,
Abstract summary: We propose the Locality-Aware Attention Transformer (LA2Former) for learning mappings governed by partial differential equations (PDEs)<n>By combining linear attention for efficient global context encoding with pairwise attention for capturing intricate local interactions, LA2Former achieves an optimal balance between computational efficiency and predictive accuracy.<n>This work underscores the critical importance of localized feature learning in advancing Transformer-based neural operators for solving PDEs on complex and irregular domains.
Score: 24.336598771550157
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural operators have emerged as promising frameworks for learning mappings governed by partial differential equations (PDEs), serving as data-driven alternatives to traditional numerical methods. While methods such as the Fourier neural operator (FNO) have demonstrated notable performance, their reliance on uniform grids restricts their applicability to complex geometries and irregular meshes. Recently, Transformer-based neural operators with linear attention mechanisms have shown potential in overcoming these limitations for large-scale PDE simulations. However, these approaches predominantly emphasize global feature aggregation, often overlooking fine-scale dynamics and localized PDE behaviors essential for accurate solutions. To address these challenges, we propose the Locality-Aware Attention Transformer (LA2Former), which leverages K-nearest neighbors for dynamic patchifying and integrates global-local attention for enhanced PDE modeling. By combining linear attention for efficient global context encoding with pairwise attention for capturing intricate local interactions, LA2Former achieves an optimal balance between computational efficiency and predictive accuracy. Extensive evaluations across six benchmark datasets demonstrate that LA2Former improves predictive accuracy by over 50% relative to existing linear attention methods, while also outperforming full pairwise attention under optimal conditions. This work underscores the critical importance of localized feature learning in advancing Transformer-based neural operators for solving PDEs on complex and irregular domains.

Related papers

GFocal: A Global-Focal Neural Operator for Solving PDEs on Arbitrary Geometries [5.323843026995587]
Transformer-based neural operators have emerged as promising surrogate solvers for partial differential equations.<n>We propose GFocal, a method that enforces simultaneous global and local feature learning and fusion.<n>Experiments show that GFocal achieves state-of-the-art performance with an average 15.2% relative gain in five out of six benchmarks.
arXiv Detail & Related papers (2025-08-06T14:02:39Z)
Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression [83.27791109672927]
We show how a state-augmented graph neural network (GNN) parametrization for the resource allocation policy circumvents the drawbacks of the ubiquitous dual subgradient methods.<n>Lagrangian maximizing state-augmented policies are learned during the offline training phase.<n>We prove a convergence result and an exponential probability bound on the excursions of the dual function (iterate) optimality gaps.
arXiv Detail & Related papers (2025-06-23T15:20:58Z)
From Local Interactions to Global Operators: Scalable Gaussian Process Operator for Physical Systems [7.807210884802377]
We introduce a novel, scalable GPO that capitalizes on sparsity, locality, and structural information through judicious kernel design.<n>We demonstrate that our framework consistently achieves high accuracy across varying discretization scales.
arXiv Detail & Related papers (2025-06-18T22:40:52Z)
Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z)
Tensor-Var: Variational Data Assimilation in Tensor Product Feature Space [30.63086465547801]
Variational data assimilation estimates the dynamical system states by minimizing a cost function that fits the numerical models with observational data.<n>The widely used method, four-dimensionalal assimilation (4D-Var), has two primary challenges: (1) computationally demanding for complex nonlinear systems and (2) relying on state-observation mappings, which are often not perfectly known.<n>Deep learning (DL) has been used as a more expressive class of efficient model approximators to address these challenges.<n>In this paper, we propose Conditional Mean Embedding (CME) to address these challenges using kernel Conditional-Var.
arXiv Detail & Related papers (2025-01-23T01:43:31Z)
HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Non-Linear Feature Learning Networks [7.06787067270941]
The integration of hyperspectral imaging (HSI) and LiDAR data within new linear feature spaces offers a promising solution to the challenges posed by the high-dimensionality and redundancy inherent in HSIs. This study introduces a dual linear fused space framework that capitalizes on bidirectional reversed convolutional neural network (CNN) pathways, coupled with a specialized spatial analysis block. The proposed method not only enhances data processing and classification accuracy, but also mitigates the computational burden typically associated with advanced models such as Transformers.
arXiv Detail & Related papers (2024-11-30T01:08:08Z)
Optimal Transport-Based Displacement Interpolation with Data Augmentation for Reduced Order Modeling of Nonlinear Dynamical Systems [0.0]
We present a novel reduced-order Model (ROM) that exploits optimal transport theory and displacement to enhance the representation of nonlinear dynamics in complex systems. We show improved accuracy and efficiency in predicting complex system behaviors, indicating the potential of this approach for a wide range of applications in computational physics and engineering.
arXiv Detail & Related papers (2024-11-13T16:29:33Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Dual Cone Gradient Descent for Training Physics-Informed Neural Networks [0.0]
Physics-informed dual neural networks (PINNs) have emerged as a prominent approach for solving partial differential equations. We propose a novel framework, Dual Cone Gradient Descent (DCGD), which adjusts the direction of the updated gradient to ensure it falls within a cone region.
arXiv Detail & Related papers (2024-09-27T03:27:46Z)
Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps [4.471962177124311]
In distributed machine learning, linear variables across multiple agents with different data poses significant challenges. In this paper we show that a framework that achieves the Lagrangian convergence on the primal variable requires no inter-agent communication.
arXiv Detail & Related papers (2024-07-02T22:14:54Z)
Enhancing Low-Order Discontinuous Galerkin Methods with Neural Ordinary Differential Equations for Compressible Navier--Stokes Equations [0.1578515540930834]
We introduce an end-to-end differentiable framework for solving the compressible Navier-Stokes equations.<n>This integrated approach combines a differentiable discontinuous Galerkin solver with a neural network source term.<n>We demonstrate the performance of the proposed framework through two examples.
arXiv Detail & Related papers (2023-10-29T04:26:23Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs. Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space. LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z)
Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems. We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems. Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.