Related papers: Training neural network ensembles via trajectory sampling

Training neural network ensembles via trajectory sampling

URL: http://arxiv.org/abs/2209.11116v2
Date: Wed, 10 May 2023 13:11:56 GMT
Title: Training neural network ensembles via trajectory sampling
Authors: Jamie F. Mair, Dominic C. Rose, Juan P. Garrahan
Abstract summary: In machine learning, there is renewed interest in neural network ensembles (NNEs) We show how to define and train a NNE using the study of rare trajectories in systems. We demonstrate the viability of this technique on a range of simple supervised learning tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In machine learning, there is renewed interest in neural network ensembles (NNEs), whereby predictions are obtained as an aggregate from a diverse set of smaller models, rather than from a single larger model. Here, we show how to define and train a NNE using techniques from the study of rare trajectories in stochastic systems. We define an NNE in terms of the trajectory of the model parameters under a simple, and discrete in time, diffusive dynamics, and train the NNE by biasing these trajectories towards a small time-integrated loss, as controlled by appropriate counting fields which act as hyperparameters. We demonstrate the viability of this technique on a range of simple supervised learning tasks. We discuss potential advantages of our trajectory sampling approach compared with more conventional gradient based methods.

Related papers

Joint Diffusion Processes as an Inductive Bias in Sheaf Neural Networks [14.224234978509026]
Sheaf Neural Networks (SNNs) naturally extend Graph Neural Networks (GNNs) We propose two novel sheaf learning approaches that provide a more intuitive understanding of the involved structure maps. In our evaluation, we show the limitations of the real-world benchmarks used so far on SNNs.
arXiv Detail & Related papers (2024-07-30T07:17:46Z)
Inferring stochastic low-rank recurrent neural networks from neural data [5.179844449042386]
A central aim in computational neuroscience is to relate the activity of large neurons to an underlying dynamical system. Low-rank recurrent neural networks (RNNs) exhibit such interpretability by having tractable dynamics. Here, we propose to fit low-rank RNNs with variational sequential Monte Carlo methods.
arXiv Detail & Related papers (2024-06-24T15:57:49Z)
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters. We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z)
Minibatch training of neural network ensembles via trajectory sampling [0.0]
We show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficient manner. We illustrate this approach by training NNEs to classify images in the MNIST datasets.
arXiv Detail & Related papers (2023-06-23T11:12:33Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Latent Neural ODEs with Sparse Bayesian Multiple Shooting [13.104556034767025]
Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. We propose a principled multiple shooting technique for neural ODEs that splits trajectories into manageable short segments, which are optimised in parallel. We demonstrate efficient and stable training, and state-of-the-art performance on multiple large-scale benchmark datasets.
arXiv Detail & Related papers (2022-10-07T11:36:29Z)
TT-NF: Tensor Train Neural Fields [88.49847274083365]
We introduce a novel low-rank representation termed Train Neural Fields (TT-NF) for learning fields on regular grids. We analyze the effect of low-rank compression on the downstream task quality metrics.
arXiv Detail & Related papers (2022-09-30T15:17:39Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Fully differentiable model discovery [0.0]
We propose an approach by combining neural network based surrogates with Sparse Bayesian Learning. Our work expands PINNs to various types of neural network architectures, and connects neural network-based surrogates to the rich field of Bayesian parameter inference.
arXiv Detail & Related papers (2021-06-09T08:11:23Z)
Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics [73.77459272878025]
We propose to enhance the supervised signal in learning dynamics by pre-training a neural differential operator (NDO) NDO is pre-trained on a class of symbolic functions, and it learns the mapping between the trajectory samples of these functions to their derivatives. We provide theoretical guarantee on that the output of NDO can well approximate the ground truth derivatives by proper tuning the complexity of the library.
arXiv Detail & Related papers (2021-06-08T08:04:47Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.