Related papers: Faster Improvement Rate Population Based Training

Faster Improvement Rate Population Based Training

URL: http://arxiv.org/abs/2109.13800v1
Date: Tue, 28 Sep 2021 15:30:55 GMT
Title: Faster Improvement Rate Population Based Training
Authors: Valentin Dalibard, Max Jaderberg
Abstract summary: This paper presents Faster Improvement Rate PBT (FIRE PBT) which addresses the problem of Population Based Training (PBT) We derive a novel fitness metric and use it to make some of the population members focus on long-term performance. Experiments show that FIRE PBT is able to outperform PBT on the ImageNet benchmark and match the performance of networks that were trained with a hand-tuned learning rate schedule.
Score: 7.661301899629696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The successful training of neural networks typically involves careful and time consuming hyperparameter tuning. Population Based Training (PBT) has recently been proposed to automate this process. PBT trains a population of neural networks concurrently, frequently mutating their hyperparameters throughout their training. However, the decision mechanisms of PBT are greedy and favour short-term improvements which can, in some cases, lead to poor long-term performance. This paper presents Faster Improvement Rate PBT (FIRE PBT) which addresses this problem. Our method is guided by an assumption: given two neural networks with similar performance and training with similar hyperparameters, the network showing the faster rate of improvement will lead to a better final performance. Using this, we derive a novel fitness metric and use it to make some of the population members focus on long-term performance. Our experiments show that FIRE PBT is able to outperform PBT on the ImageNet benchmark and match the performance of networks that were trained with a hand-tuned learning rate schedule. We apply FIRE PBT to reinforcement learning tasks and show that it leads to faster learning and higher final performance than both PBT and random hyperparameter search.

Related papers

Energy-Based Transformers are Scalable Learners and Thinkers [84.7474634026213]
Energy-Based Transformers (EBTs) are a new class of Energy-Based Models (EBMs)<n>We train EBTs to assign an energy value to every input and candidate-prediction pair, enabling predictions through gradient descent-based energy until convergence.<n>During inference, EBTs improve performance with System 2 Thinking by 29% more than the Transformer++ on language tasks.
arXiv Detail & Related papers (2025-07-02T19:17:29Z)
Multiple-Frequencies Population-Based Training [2.691655918692203]
We propose a novel HPO algorithm that addresses greediness by employing sub-populations.<n>MF-PBT introduces a migration process to transfer information between sub-populations.
arXiv Detail & Related papers (2025-06-03T11:19:21Z)
Simultaneous Training of First- and Second-Order Optimizers in Population-Based Reinforcement Learning [0.0]
Population-based training (PBT) provides a method to achieve this by continuously tuning hyperparameters throughout the training. We propose an enhancement to PBT by simultaneously utilizing both first- and second-orders within a single population.
arXiv Detail & Related papers (2024-08-27T21:54:26Z)
Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning [10.164982368785854]
Generalized Population-Based Training (GPBT) and Pairwise Learning (PL) PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents.
arXiv Detail & Related papers (2024-04-12T04:23:20Z)
Efficient Stagewise Pretraining via Progressive Subnetworks [53.00045381931778]
The prevailing view suggests that stagewise dropping strategies, such as layer dropping, are ineffective when compared to stacking-based approaches. This paper challenges this notion by demonstrating that, with proper design, dropping strategies can be competitive, if not better, than stacking methods. We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork at each step, progressively increasing the size in stages.
arXiv Detail & Related papers (2024-02-08T18:49:09Z)
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture Search [62.997667081978825]
We show that simultaneously training and mixing neural networks is a promising way to conduct Neural Architecture Search (NAS) We propose PBT-NAS, an adaptation of PBT to NAS where architectures are improved during training by replacing poorly-performing networks in a population with the result of mixing well-performing ones and inheriting the weights using the shrink-perturb technique.
arXiv Detail & Related papers (2023-07-28T15:29:52Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
Test-time Batch Normalization [61.292862024903584]
Deep neural networks often suffer the data distribution shift between training and testing. We revisit the batch normalization (BN) in the training process and reveal two key insights benefiting test-time optimization. We propose a novel test-time BN layer design, GpreBN, which is optimized during testing by minimizing Entropy loss.
arXiv Detail & Related papers (2022-05-20T14:33:39Z)
How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency. A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search. We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z)
Regularized Evolutionary Population-Based Training [11.624954122221562]
This paper presents an algorithm called Population-Based Training (EPBT) that interleaves the training of a DNN's weights with the metalearning of loss functions. EPBT results in faster, more accurate learning on image classification benchmarks.
arXiv Detail & Related papers (2020-02-11T06:28:13Z)
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits [12.525529586816955]
We introduce the first provably efficient Population-Based Bandits algorithm. PB2 uses a probabilistic model to guide the search in an efficient way. We show in a series of RL experiments that PB2 is able to achieve high performance with a modest computational budget.
arXiv Detail & Related papers (2020-02-06T21:27:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.