Regularized Evolutionary Population-Based Training
- URL: http://arxiv.org/abs/2002.04225v4
- Date: Wed, 21 Jul 2021 04:04:51 GMT
- Title: Regularized Evolutionary Population-Based Training
- Authors: Jason Liang, Santiago Gonzalez, Hormoz Shahrzad, and Risto
Miikkulainen
- Abstract summary: This paper presents an algorithm called Population-Based Training (EPBT) that interleaves the training of a DNN's weights with the metalearning of loss functions.
EPBT results in faster, more accurate learning on image classification benchmarks.
- Score: 11.624954122221562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metalearning of deep neural network (DNN) architectures and hyperparameters
has become an increasingly important area of research. At the same time,
network regularization has been recognized as a crucial dimension to effective
training of DNNs. However, the role of metalearning in establishing effective
regularization has not yet been fully explored. There is recent evidence that
loss-function optimization could play this role, however it is computationally
impractical as an outer loop to full training. This paper presents an algorithm
called Evolutionary Population-Based Training (EPBT) that interleaves the
training of a DNN's weights with the metalearning of loss functions. They are
parameterized using multivariate Taylor expansions that EPBT can directly
optimize. Such simultaneous adaptation of weights and loss functions can be
deceptive, and therefore EPBT uses a quality-diversity heuristic called Novelty
Pulsation as well as knowledge distillation to prevent overfitting during
training. On the CIFAR-10 and SVHN image classification benchmarks, EPBT
results in faster, more accurate learning. The discovered hyperparameters adapt
to the training process and serve to regularize the learning task by
discouraging overfitting to the labels. EPBT thus demonstrates a practical
instantiation of regularization metalearning based on simultaneous training.
Related papers
- Estimating Post-Synaptic Effects for Online Training of Feed-Forward
SNNs [0.27016900604393124]
Facilitating online learning in spiking neural networks (SNNs) is a key step in developing event-based models.
We propose Online Training with Postsynaptic Estimates (OTPE) for training feed-forward SNNs.
We show improved scaling for multi-layer networks using a novel approximation of temporal effects on the subsequent layer's activity.
arXiv Detail & Related papers (2023-11-07T16:53:39Z) - Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable
Transformers [107.3726071306935]
We propose a new plug-and-play training framework, SMoE-Dropout, to enable scaling transformers to better accuracy in their full capacity without collapse.
SMoE-Dropout consists of a randomly and fixed router network to activate experts and gradually increases the activated expert number as training progresses over time.
Our experiments demonstrate the superior performance and substantial computation savings of SMoE-Dropout, compared to dense training baselines with equivalent parameter counts.
arXiv Detail & Related papers (2023-03-02T22:12:51Z) - SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Towards Scaling Difference Target Propagation by Learning Backprop
Targets [64.90165892557776]
Difference Target Propagation is a biologically-plausible learning algorithm with close relation with Gauss-Newton (GN) optimization.
We propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored.
We report the best performance ever achieved by DTP on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2022-01-31T18:20:43Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.