DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators
- URL: http://arxiv.org/abs/2410.09298v2
- Date: Fri, 15 Nov 2024 01:09:27 GMT
- Title: DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators
- Authors: Shao-Ting Chiu, Junyuan Hong, Ulisses Braga-Neto,
- Abstract summary: In-context operator learning allows a trained machine learning model to learn from a user prompt without further training.
DeepOSets adds in-context learning capabilities to Deep Operator Networks (DeepONets) by combining it with the DeepSets architecture.
As the first non-autoregressive model for in-context operator learning, DeepOSets allow the user prompt to be processed in parallel.
- Score: 11.913853433712855
- License:
- Abstract: We introduce DeepSets Operator Networks (DeepOSets), an efficient, non-autoregressive neural network architecture for in-context operator learning. In-context learning allows a trained machine learning model to learn from a user prompt without further training. DeepOSets adds in-context learning capabilities to Deep Operator Networks (DeepONets) by combining it with the DeepSets architecture. As the first non-autoregressive model for in-context operator learning, DeepOSets allow the user prompt to be processed in parallel, leading to significant computational savings. Here, we present the application of DeepOSets in the problem of learning supervised learning algorithms, which are operators mapping a finite-dimensional space of labeled data into an infinite-dimensional hypothesis space of prediction functions. In an empirical comparison with a popular autoregressive (transformer-based) model for in-context learning of linear regression in one and five dimensions, DeepOSets reduced the number of model weights by several orders of magnitude and required a fraction of training and inference time. Furthermore, DeepOSets proved to be less sensitive to noise, significantly outperforming the transformer model in noisy settings.
Related papers
- Reliable extrapolation of deep neural operators informed by physics or
sparse observations [2.887258133992338]
Deep neural operators can learn nonlinear mappings between infinite-dimensional function spaces via deep neural networks.
DeepONets provide a new simulation paradigm in science and engineering.
We propose five reliable learning methods that guarantee a safe prediction under extrapolation.
arXiv Detail & Related papers (2022-12-13T03:02:46Z) - What learning algorithm is in-context learning? Investigations with
linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly.
We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression.
Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z) - Multifidelity deep neural operators for efficient learning of partial
differential equations with application to fast inverse design of nanoscale
heat transport [2.512625172084287]
We develop a multifidelity neural operator based on a deep operator network (DeepONet)
A multifidelity DeepONet significantly reduces the required amount of high-fidelity data and achieves one order of magnitude smaller error when using the same amount of high-fidelity data.
We apply a multifidelity DeepONet to learn the phonon Boltzmann transport equation (BTE), a framework to compute nanoscale heat transport.
arXiv Detail & Related papers (2022-04-14T01:01:24Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Improved architectures and training algorithms for deep operator
networks [0.0]
Operator learning techniques have emerged as a powerful tool for learning maps between infinite-dimensional Banach spaces.
We analyze the training dynamics of deep operator networks (DeepONets) through the lens of Neural Tangent Kernel (NTK) theory.
arXiv Detail & Related papers (2021-10-04T18:34:41Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Training Deep Neural Networks with Constrained Learning Parameters [4.917317902787792]
A significant portion of deep learning tasks would run on edge computing systems.
We propose the Combinatorial Neural Network Training Algorithm (CoNNTrA)
CoNNTrA trains deep learning models with ternary learning parameters on the MNIST, Iris and ImageNet data sets.
Our results indicate that CoNNTrA models use 32x less memory and have errors at par with the Backpropagation models.
arXiv Detail & Related papers (2020-09-01T16:20:11Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Deep Transfer Learning with Ridge Regression [7.843067454030999]
Deep models trained with massive amounts of data demonstrate promising generalisation ability on unseen data from relevant domains.
We address this issue by leveraging the low-rank property of learnt feature vectors produced from deep neural networks (DNNs) with the closed-form solution provided in kernel ridge regression (KRR)
Our method is successful on supervised and semi-supervised transfer learning tasks.
arXiv Detail & Related papers (2020-06-11T20:21:35Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.