Regularized Evolutionary Population-Based Training
- URL: http://arxiv.org/abs/2002.04225v4
- Date: Wed, 21 Jul 2021 04:04:51 GMT
- Title: Regularized Evolutionary Population-Based Training
- Authors: Jason Liang, Santiago Gonzalez, Hormoz Shahrzad, and Risto
Miikkulainen
- Abstract summary: This paper presents an algorithm called Population-Based Training (EPBT) that interleaves the training of a DNN's weights with the metalearning of loss functions.
EPBT results in faster, more accurate learning on image classification benchmarks.
- Score: 11.624954122221562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metalearning of deep neural network (DNN) architectures and hyperparameters
has become an increasingly important area of research. At the same time,
network regularization has been recognized as a crucial dimension to effective
training of DNNs. However, the role of metalearning in establishing effective
regularization has not yet been fully explored. There is recent evidence that
loss-function optimization could play this role, however it is computationally
impractical as an outer loop to full training. This paper presents an algorithm
called Evolutionary Population-Based Training (EPBT) that interleaves the
training of a DNN's weights with the metalearning of loss functions. They are
parameterized using multivariate Taylor expansions that EPBT can directly
optimize. Such simultaneous adaptation of weights and loss functions can be
deceptive, and therefore EPBT uses a quality-diversity heuristic called Novelty
Pulsation as well as knowledge distillation to prevent overfitting during
training. On the CIFAR-10 and SVHN image classification benchmarks, EPBT
results in faster, more accurate learning. The discovered hyperparameters adapt
to the training process and serve to regularize the learning task by
discouraging overfitting to the labels. EPBT thus demonstrates a practical
instantiation of regularization metalearning based on simultaneous training.
Related papers
- Comprehensive Online Training and Deployment for Spiking Neural Networks [40.255762156745405]
Spiking Neural Networks (SNNs) are considered to have enormous potential in the future development of Artificial Intelligence (AI)
The current proposed online training methods cannot tackle the inseparability problem of temporal dependent gradients.
We propose Efficient Multi-Precision Firing (EM-PF) model, which is a family of advanced spiking models based on floating-point spikes and binary synaptic weights.
arXiv Detail & Related papers (2024-10-10T02:39:22Z) - Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable
Transformers [107.3726071306935]
We propose a new plug-and-play training framework, SMoE-Dropout, to enable scaling transformers to better accuracy in their full capacity without collapse.
SMoE-Dropout consists of a randomly and fixed router network to activate experts and gradually increases the activated expert number as training progresses over time.
Our experiments demonstrate the superior performance and substantial computation savings of SMoE-Dropout, compared to dense training baselines with equivalent parameter counts.
arXiv Detail & Related papers (2023-03-02T22:12:51Z) - SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Towards Scaling Difference Target Propagation by Learning Backprop
Targets [64.90165892557776]
Difference Target Propagation is a biologically-plausible learning algorithm with close relation with Gauss-Newton (GN) optimization.
We propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored.
We report the best performance ever achieved by DTP on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2022-01-31T18:20:43Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.