Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics Discovery
- URL: http://arxiv.org/abs/2408.07307v1
- Date: Wed, 14 Aug 2024 05:57:56 GMT
- Title: Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics Discovery
- Authors: Yue Yu, Ning Liu, Fei Lu, Tian Gao, Siavash Jafarzadeh, Stewart Silling,
- Abstract summary: We propose a novel neural operator architecture based on the attention mechanism, which we coin Nonlocal Attention Operator (NAO)
NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability.
- Score: 25.75410883895742
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the recent popularity of attention-based neural architectures in core AI fields like natural language processing (NLP) and computer vision (CV), their potential in modeling complex physical systems remains under-explored. Learning problems in physical systems are often characterized as discovering operators that map between function spaces based on a few instances of function pairs. This task frequently presents a severely ill-posed PDE inverse problem. In this work, we propose a novel neural operator architecture based on the attention mechanism, which we coin Nonlocal Attention Operator (NAO), and explore its capability towards developing a foundation physical model. In particular, we show that the attention mechanism is equivalent to a double integral operator that enables nonlocal interactions among spatial tokens, with a data-dependent kernel characterizing the inverse mapping from data to the hidden parameter field of the underlying operator. As such, the attention mechanism extracts global prior information from training data generated by multiple systems, and suggests the exploratory space in the form of a nonlinear kernel map. Consequently, NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability. We empirically demonstrate the advantages of NAO over baseline neural models in terms of generalizability to unseen data resolutions and system states. Our work not only suggests a novel neural operator architecture for learning interpretable foundation models of physical systems, but also offers a new perspective towards understanding the attention mechanism.
Related papers
- DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning [63.5925701087252]
We introduce DimOL (Dimension-aware Operator Learning), drawing insights from dimensional analysis.
To implement DimOL, we propose the ProdLayer, which can be seamlessly integrated into FNO-based and Transformer-based PDE solvers.
Empirically, DimOL models achieve up to 48% performance gain within the PDE datasets.
arXiv Detail & Related papers (2024-10-08T10:48:50Z) - Disentangled Representation Learning for Parametric Partial Differential Equations [31.240283037552427]
We propose a new paradigm for learning disentangled representations from neural operator parameters.
DisentangO is a novel hyper-neural operator architecture designed to unveil and disentangle the latent physical factors of variation embedded within the black-box neural operator parameters.
We show that DisentangO effectively extracts meaningful and interpretable latent features, bridging the divide between predictive performance and physical understanding in neural operator frameworks.
arXiv Detail & Related papers (2024-10-03T01:40:39Z) - Neural Operators with Localized Integral and Differential Kernels [77.76991758980003]
We present a principled approach to operator learning that can capture local features under two frameworks.
We prove that we obtain differential operators under an appropriate scaling of the kernel values of CNNs.
To obtain local integral operators, we utilize suitable basis representations for the kernels based on discrete-continuous convolutions.
arXiv Detail & Related papers (2024-02-26T18:59:31Z) - Unraveling Feature Extraction Mechanisms in Neural Networks [10.13842157577026]
We propose a theoretical approach based on Neural Tangent Kernels (NTKs) to investigate such mechanisms.
We reveal how these models leverage statistical features during gradient descent and how they are integrated into final decisions.
We find that while self-attention and CNN models may exhibit limitations in learning n-grams, multiplication-based models seem to excel in this area.
arXiv Detail & Related papers (2023-10-25T04:22:40Z) - Improved Operator Learning by Orthogonal Attention [17.394770071994145]
We develop an attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions.
Our method can outperform competing baselines with decent margins.
arXiv Detail & Related papers (2023-10-19T05:47:28Z) - INO: Invariant Neural Operators for Learning Complex Physical Systems
with Momentum Conservation [8.218875461185016]
We introduce a novel integral neural operator architecture, to learn physical models with fundamental conservation laws automatically guaranteed.
As applications, we demonstrate the expressivity and efficacy of our model in learning complex material behaviors from both synthetic and experimental datasets.
arXiv Detail & Related papers (2022-12-29T16:40:41Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Reducing Catastrophic Forgetting in Self Organizing Maps with
Internally-Induced Generative Replay [67.50637511633212]
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data.
One major historic difficulty in building agents that adapt is that neural systems struggle to retain previously-acquired knowledge when learning from new samples.
This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day.
arXiv Detail & Related papers (2021-12-09T07:11:14Z) - Neural Operator: Learning Maps Between Function Spaces [75.93843876663128]
We propose a generalization of neural networks to learn operators, termed neural operators, that map between infinite dimensional function spaces.
We prove a universal approximation theorem for our proposed neural operator, showing that it can approximate any given nonlinear continuous operator.
An important application for neural operators is learning surrogate maps for the solution operators of partial differential equations.
arXiv Detail & Related papers (2021-08-19T03:56:49Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - On the application of Physically-Guided Neural Networks with Internal
Variables to Continuum Problems [0.0]
We present Physically-Guided Neural Networks with Internal Variables (PGNNIV)
universal physical laws are used as constraints in the neural network, in such a way that some neuron values can be interpreted as internal state variables of the system.
This endows the network with unraveling capacity, as well as better predictive properties such as faster convergence, fewer data needs and additional noise filtering.
We extend this new methodology to continuum physical problems, showing again its predictive and explanatory capacities when only using measurable values in the training set.
arXiv Detail & Related papers (2020-11-23T13:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.