Learning Operators with Coupled Attention
- URL: http://arxiv.org/abs/2201.01032v1
- Date: Tue, 4 Jan 2022 08:22:03 GMT
- Title: Learning Operators with Coupled Attention
- Authors: Georgios Kissas, Jacob Seidman, Leonardo Ferreira Guilhoto, Victor M.
Preciado, George J. Pappas and Paris Perdikaris
- Abstract summary: We propose a novel operator learning method, LOCA, motivated from the recent success of the attention mechanism.
In our architecture the input functions are mapped to a finite set of features which are then averaged with attention weights that depend on the output query locations.
By coupling these attention weights together with an integral transform, LOCA is able to explicitly learn correlations in the target output functions.
- Score: 9.715465024071333
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Supervised operator learning is an emerging machine learning paradigm with
applications to modeling the evolution of spatio-temporal dynamical systems and
approximating general black-box relationships between functional data. We
propose a novel operator learning method, LOCA (Learning Operators with Coupled
Attention), motivated from the recent success of the attention mechanism. In
our architecture, the input functions are mapped to a finite set of features
which are then averaged with attention weights that depend on the output query
locations. By coupling these attention weights together with an integral
transform, LOCA is able to explicitly learn correlations in the target output
functions, enabling us to approximate nonlinear operators even when the number
of output function in the training set measurements is very small. Our
formulation is accompanied by rigorous approximation theoretic guarantees on
the universal expressiveness of the proposed model. Empirically, we evaluate
the performance of LOCA on several operator learning scenarios involving
systems governed by ordinary and partial differential equations, as well as a
black-box climate prediction problem. Through these scenarios we demonstrate
state of the art accuracy, robustness with respect to noisy input data, and a
consistently small spread of errors over testing data sets, even for
out-of-distribution prediction tasks.
Related papers
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - PICL: Physics Informed Contrastive Learning for Partial Differential Equations [7.136205674624813]
We develop a novel contrastive pretraining framework that improves neural operator generalization across multiple governing equations simultaneously.
A combination of physics-informed system evolution and latent-space model output are anchored to input data and used in our distance function.
We find that physics-informed contrastive pretraining improves accuracy for the Fourier Neural Operator in fixed-future and autoregressive rollout tasks for the 1D and 2D Heat, Burgers', and linear advection equations.
arXiv Detail & Related papers (2024-01-29T17:32:22Z) - Learning invariant representations of time-homogeneous stochastic dynamical systems [27.127773672738535]
We study the problem of learning a representation of the state that faithfully captures its dynamics.
This is instrumental to learning the transfer operator or the generator of the system.
We show that the search for a good representation can be cast as an optimization problem over neural networks.
arXiv Detail & Related papers (2023-07-19T11:32:24Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z) - Learning outside the Black-Box: The pursuit of interpretable models [78.32475359554395]
This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function.
Our interpretation represents a leap forward from the previous state of the art.
arXiv Detail & Related papers (2020-11-17T12:39:44Z) - Network Classifiers Based on Social Learning [71.86764107527812]
We propose a new way of combining independently trained classifiers over space and time.
The proposed architecture is able to improve prediction performance over time with unlabeled data.
We show that this strategy results in consistent learning with high probability, and it yields a robust structure against poorly trained classifiers.
arXiv Detail & Related papers (2020-10-23T11:18:20Z) - Function Contrastive Learning of Transferable Meta-Representations [38.31692245188669]
We study the implications of joint training on the transferability of the meta-representations.
We propose a decoupled encoder-decoder approach to supervised meta-learning.
arXiv Detail & Related papers (2020-10-14T13:50:22Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - On the Estimation of Complex Circuits Functional Failure Rate by Machine
Learning Techniques [0.16311150636417257]
De-Rating or Vulnerability Factors are a major feature of failure analysis efforts mandated by today's Functional Safety requirements.
New approach is proposed which uses Machine Learning to estimate the Functional De-Rating of individual flip-flops.
arXiv Detail & Related papers (2020-02-18T15:18:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.