Related papers: Continuum Attention for Neural Operators

Continuum Attention for Neural Operators

URL: http://arxiv.org/abs/2406.06486v1
Date: Mon, 10 Jun 2024 17:25:46 GMT
Title: Continuum Attention for Neural Operators
Authors: Edoardo Calvello, Nikola B. Kovachki, Matthew E. Levine, Andrew M. Stuart,
Abstract summary: We study transformers in the function space setting. We prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators.
Score: 6.425471760071227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators.

Related papers

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition [52.11481619456093]
We find a function induction mechanism that explains the model's generalization from standard addition to off-by-one addition.<n>This mechanism resembles the structure of the induction head mechanism found in prior work and elevates it to a higher level of abstraction.<n>We find that this function induction mechanism is reused in a broader range of tasks, including synthetic tasks such as shifted multiple-choice QA and algorithmic tasks such as base-8 addition.
arXiv Detail & Related papers (2025-07-14T03:20:55Z)
Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning [78.88684753303794]
Deep learning has predominantly advanced through applications in computer vision and natural language processing.<n>Neural operators are a principled way to generalize neural networks to mappings between function spaces.<n>This paper identifies and distills the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces.
arXiv Detail & Related papers (2025-06-12T17:59:31Z)
Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery [15.29112632863168]
We introduce Neural Interpretable PDEs (NIPS), a novel neural operator architecture that builds upon and enhances Nonlocal Attention Operators (NAO)<n>NIPS employs a linear attention mechanism to enable scalable learning and integrates a learnable kernel network that acts as a channel-independent convolution in Fourier space.<n> Empirical evaluations demonstrate that NIPS consistently surpasses NAO and other baselines across diverse benchmarks.
arXiv Detail & Related papers (2025-05-29T05:18:30Z)
Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics Discovery [25.75410883895742]
We propose a novel neural operator architecture based on the attention mechanism, which we coin Nonlocal Attention Operator (NAO) NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability.
arXiv Detail & Related papers (2024-08-14T05:57:56Z)
Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning [3.183339674210516]
Partial Transformer-based operator learning is rapidly emerging as a promising approach for surrogate modeling of Differential Equations. This paper proposes Positionattention, built on an innovative position-attention mechanism, which demonstrates significant advantages in operator learning. PiT possesses an enhanced disctreization feature, compared to the widely-used neural operator.
arXiv Detail & Related papers (2024-05-15T12:09:24Z)
Neural Operators with Localized Integral and Differential Kernels [77.76991758980003]
We present a principled approach to operator learning that can capture local features under two frameworks. We prove that we obtain differential operators under an appropriate scaling of the kernel values of CNNs. To obtain local integral operators, we utilize suitable basis representations for the kernels based on discrete-continuous convolutions.
arXiv Detail & Related papers (2024-02-26T18:59:31Z)
Representation Equivalent Neural Operators: a Framework for Alias-free Operator Learning [11.11883703395469]
This research offers a fresh take on neural operators with a framework Representation equivalent Neural Operators (ReNO) At its core is the concept of operator aliasing, which measures inconsistency between neural operators and their discrete representations. Our findings detail how aliasing introduces errors when handling different discretizations and grids and loss of crucial continuous structures.
arXiv Detail & Related papers (2023-05-31T14:45:34Z)
Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions [63.21838830509772]
We develop a framework for extending set functions onto low-dimensional continuous domains. Our framework subsumes many well-known extensions as special cases. We convert low-dimensional neural network bottlenecks into representations in high-dimensional spaces.
arXiv Detail & Related papers (2022-08-08T10:58:02Z)
Learning Operators with Coupled Attention [9.715465024071333]
We propose a novel operator learning method, LOCA, motivated from the recent success of the attention mechanism. In our architecture the input functions are mapped to a finite set of features which are then averaged with attention weights that depend on the output query locations. By coupling these attention weights together with an integral transform, LOCA is able to explicitly learn correlations in the target output functions.
arXiv Detail & Related papers (2022-01-04T08:22:03Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Neural Operator: Learning Maps Between Function Spaces [75.93843876663128]
We propose a generalization of neural networks to learn operators, termed neural operators, that map between infinite dimensional function spaces. We prove a universal approximation theorem for our proposed neural operator, showing that it can approximate any given nonlinear continuous operator. An important application for neural operators is learning surrogate maps for the solution operators of partial differential equations.
arXiv Detail & Related papers (2021-08-19T03:56:49Z)
A Functional Perspective on Learning Symmetric Functions with Neural Networks [48.80300074254758]
We study the learning and representation of neural networks defined on measures. We establish approximation and generalization bounds under different choices of regularization. The resulting models can be learned efficiently and enjoy generalization guarantees that extend across input sizes.
arXiv Detail & Related papers (2020-08-16T16:34:33Z)
Space of Functions Computed by Deep-Layered Machines [74.13735716675987]
We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits. Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models.
arXiv Detail & Related papers (2020-04-19T18:31:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.