Multirate Training of Neural Networks
- URL: http://arxiv.org/abs/2106.10771v1
- Date: Sun, 20 Jun 2021 22:44:55 GMT
- Title: Multirate Training of Neural Networks
- Authors: Tiffany Vlaar and Benedict Leimkuhler
- Abstract summary: We show that for various transfer learning applications in vision and NLP we can fine-tune deep neural networks in almost half the time.
We propose an additional multirate technique which can learn different features present in the data by training the full network on different time scales simultaneously.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose multirate training of neural networks: partitioning neural network
parameters into "fast" and "slow" parts which are trained simultaneously using
different learning rates. By choosing appropriate partitionings we can obtain
large computational speed-ups for transfer learning tasks. We show that for
various transfer learning applications in vision and NLP we can fine-tune deep
neural networks in almost half the time, without reducing the generalization
performance of the resulting model. We also discuss other splitting choices for
the neural network parameters which are beneficial in enhancing generalization
performance in settings where neural networks are trained from scratch.
Finally, we propose an additional multirate technique which can learn different
features present in the data by training the full network on different time
scales simultaneously. The benefits of using this approach are illustrated for
ResNet architectures on image data. Our paper unlocks the potential of using
multirate techniques for neural network training and provides many starting
points for future work in this area.
Related papers
- Peer-to-Peer Learning Dynamics of Wide Neural Networks [10.179711440042123]
We provide an explicit, non-asymptotic characterization of the learning dynamics of wide neural networks trained using popularDGD algorithms.
We validate our analytical results by accurately predicting error and error and for classification tasks.
arXiv Detail & Related papers (2024-09-23T17:57:58Z) - NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance [0.0]
We propose a zero-cost proxy Network Expressivity by Activation Rank (NEAR) to identify the optimal neural network without training.
We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS.
arXiv Detail & Related papers (2024-08-16T14:38:14Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Accelerating SNN Training with Stochastic Parallelizable Spiking Neurons [1.7056768055368383]
Spiking neural networks (SNN) are able to learn features while using less energy, especially on neuromorphic hardware.
Most widely used neuron in deep learning is the temporal and Fire (LIF) neuron.
arXiv Detail & Related papers (2023-06-22T04:25:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Accelerate Model Parallel Training by Using Efficient Graph Traversal
Order in Device Placement [1.577134752543077]
Modern neural networks require long training to reach decent performance on massive datasets.
One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices.
Most of the existing device placement solutions treat the problem as sequential decision-making.
arXiv Detail & Related papers (2022-01-21T09:27:48Z) - Feature Alignment for Approximated Reversibility in Neural Networks [0.0]
We introduce feature alignment, a technique for obtaining approximate reversibility in artificial neural networks.
We show that the technique can be modified for training neural networks locally, saving computational memory resources.
arXiv Detail & Related papers (2021-06-23T17:42:47Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.