Minibatch training of neural network ensembles via trajectory sampling
- URL: http://arxiv.org/abs/2306.13442v2
- Date: Tue, 27 Jun 2023 13:01:26 GMT
- Title: Minibatch training of neural network ensembles via trajectory sampling
- Authors: Jamie F. Mair, Luke Causer, Juan P. Garrahan
- Abstract summary: We show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficient manner.
We illustrate this approach by training NNEs to classify images in the MNIST datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most iterative neural network training methods use estimates of the loss
function over small random subsets (or minibatches) of the data to update the
parameters, which aid in decoupling the training time from the (often very
large) size of the training datasets. Here, we show that a minibatch approach
can also be used to train neural network ensembles (NNEs) via trajectory
methods in a highly efficient manner. We illustrate this approach by training
NNEs to classify images in the MNIST datasets. This method gives an improvement
to the training times, allowing it to scale as the ratio of the size of the
dataset to that of the average minibatch size which, in the case of MNIST,
gives a computational improvement typically of two orders of magnitude. We
highlight the advantage of using longer trajectories to represent NNEs, both
for improved accuracy in inference and reduced update cost in terms of the
samples needed in minibatch updates.
Related papers
- BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling [8.859850475075238]
We propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead.
By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall.
arXiv Detail & Related papers (2023-10-16T23:14:56Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - Data optimization for large batch distributed training of deep neural
networks [0.19336815376402716]
Current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale.
We propose a data optimization approach that utilize machine learning to implicitly smooth out the loss landscape resulting in fewer local minima.
Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.
arXiv Detail & Related papers (2020-12-16T21:22:02Z) - RNN Training along Locally Optimal Trajectories via Frank-Wolfe
Algorithm [50.76576946099215]
We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region.
We develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation.
arXiv Detail & Related papers (2020-10-12T01:59:18Z) - Optimal training of integer-valued neural networks with mixed integer
programming [2.528056693920671]
We develop new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs)
We provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train.
Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NNs with MIP.
arXiv Detail & Related papers (2020-09-08T15:45:44Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.