Related papers: DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators

DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators

URL: http://arxiv.org/abs/2410.09298v3
Date: Mon, 03 Feb 2025 21:24:30 GMT
Title: DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators
Authors: Shao-Ting Chiu, Junyuan Hong, Ulisses Braga-Neto,
Abstract summary: We introduce DeepSets Operator Networks (DeepOSets), an efficient, non-autoregressive neural network architecture for in-context learning of permutation-invariant operators.<n>DeepOSets combines the operator learning capabilities of Deep Operator Networks (DeepONets) with the set learning capabilities of DeepSets.
Score: 11.913853433712855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce DeepSets Operator Networks (DeepOSets), an efficient, non-autoregressive neural network architecture for in-context learning of permutation-invariant operators. DeepOSets combines the operator learning capabilities of Deep Operator Networks (DeepONets) with the set learning capabilities of DeepSets. Here, we present the application of DeepOSets to the problem of learning supervised learning algorithms, which are continuous permutation-invariant operators. We show that DeepOSets are universal approximators for this class of operators. In an empirical comparison with a popular autoregressive (transformer-based) model for in-context learning of linear regression, DeepOSets reduced the number of model weights by several orders of magnitude and required a fraction of training and inference time, in addition to significantly outperforming the transformer model in noisy settings. We also demonstrate the multiple operator learning capabilities of DeepOSets with a polynomial regression experiment where the order of the polynomial is learned in-context from the prompt.

Related papers

A Library for Learning Neural Operators [75.14579433742178]
We present NeuralOperator, an open-source Python library for operator learning. Neural operators generalize neural networks to maps between function spaces instead of finite-dimensional Euclidean spaces. Built on top of PyTorch, NeuralOperator provides all the tools for training and deploying neural operator models.
arXiv Detail & Related papers (2024-12-13T18:49:37Z)
On the training and generalization of deep operator networks [11.159056906971983]
We present a novel training method for deep operator networks (DeepONets) DeepONets are constructed by two sub-networks. We establish the width error estimate in terms of input data.
arXiv Detail & Related papers (2023-09-02T21:10:45Z)
Reliable extrapolation of deep neural operators informed by physics or sparse observations [2.887258133992338]
Deep neural operators can learn nonlinear mappings between infinite-dimensional function spaces via deep neural networks. DeepONets provide a new simulation paradigm in science and engineering. We propose five reliable learning methods that guarantee a safe prediction under extrapolation.
arXiv Detail & Related papers (2022-12-13T03:02:46Z)
Transfer Learning Enhanced DeepONet for Long-Time Prediction of Evolution Equations [9.748550197032785]
Deep operator network (DeepONet) has demonstrated great success in various learning tasks. This paper proposes a em transfer-learning aided DeepONet to enhance the stability.
arXiv Detail & Related papers (2022-12-09T04:37:08Z)
What learning algorithm is in-context learning? Investigations with linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly. We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression. Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z)
Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport [2.512625172084287]
We develop a multifidelity neural operator based on a deep operator network (DeepONet) A multifidelity DeepONet significantly reduces the required amount of high-fidelity data and achieves one order of magnitude smaller error when using the same amount of high-fidelity data. We apply a multifidelity DeepONet to learn the phonon Boltzmann transport equation (BTE), a framework to compute nanoscale heat transport.
arXiv Detail & Related papers (2022-04-14T01:01:24Z)
MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems [12.826754199680474]
A new data-driven method for operator learning of differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse problems more effectively using limited data.
arXiv Detail & Related papers (2022-04-07T03:53:49Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Improved architectures and training algorithms for deep operator networks [0.0]
Operator learning techniques have emerged as a powerful tool for learning maps between infinite-dimensional Banach spaces. We analyze the training dynamics of deep operator networks (DeepONets) through the lens of Neural Tangent Kernel (NTK) theory.
arXiv Detail & Related papers (2021-10-04T18:34:41Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
Training Deep Neural Networks with Constrained Learning Parameters [4.917317902787792]
A significant portion of deep learning tasks would run on edge computing systems. We propose the Combinatorial Neural Network Training Algorithm (CoNNTrA) CoNNTrA trains deep learning models with ternary learning parameters on the MNIST, Iris and ImageNet data sets. Our results indicate that CoNNTrA models use 32x less memory and have errors at par with the Backpropagation models.
arXiv Detail & Related papers (2020-09-01T16:20:11Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
Deep Transfer Learning with Ridge Regression [7.843067454030999]
Deep models trained with massive amounts of data demonstrate promising generalisation ability on unseen data from relevant domains. We address this issue by leveraging the low-rank property of learnt feature vectors produced from deep neural networks (DNNs) with the closed-form solution provided in kernel ridge regression (KRR) Our method is successful on supervised and semi-supervised transfer learning tasks.
arXiv Detail & Related papers (2020-06-11T20:21:35Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.