Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
Strategies
- URL: http://arxiv.org/abs/2304.12180v2
- Date: Sat, 9 Dec 2023 22:20:16 GMT
- Title: Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
Strategies
- Authors: Oscar Li, James Harrison, Jascha Sohl-Dickstein, Virginia Smith, Luke
Metz
- Abstract summary: Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods.
We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
- Score: 50.10277748405355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unrolled computation graphs are prevalent throughout machine learning but
present challenges to automatic differentiation (AD) gradient estimation
methods when their loss functions exhibit extreme local sensitivtiy,
discontinuity, or blackbox characteristics. In such scenarios, online evolution
strategies methods are a more capable alternative, while being more
parallelizable than vanilla evolution strategies (ES) by interleaving partial
unrolls and gradient updates. In this work, we propose a general class of
unbiased online evolution strategies methods. We analytically and empirically
characterize the variance of this class of gradient estimators and identify the
one with the least variance, which we term Noise-Reuse Evolution Strategies
(NRES). Experimentally, we show NRES results in faster convergence than
existing AD and ES methods in terms of wall-clock time and number of unroll
steps across a variety of applications, including learning dynamical systems,
meta-training learned optimizers, and reinforcement learning.
Related papers
- Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients.
We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS.
CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z) - Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - Byzantine-Robust Decentralized Stochastic Optimization with Stochastic
Gradient Noise-Independent Learning Error [25.15075119957447]
We study Byzantine-robust optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by gradient descent (SGD)
The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process.
arXiv Detail & Related papers (2023-08-10T02:14:23Z) - Lottery Tickets in Evolutionary Optimization: On Sparse
Backpropagation-Free Trainability [0.0]
We study gradient descent (GD)-based sparse training and evolution strategies (ES)
We find that ES explore diverse and flat local optima and do not preserve linear mode connectivity across sparsity levels and independent runs.
arXiv Detail & Related papers (2023-05-31T15:58:54Z) - Discovering Evolution Strategies via Meta-Black-Box Optimization [23.956974467496345]
We propose to discover effective update rules for evolution strategies via meta-learning.
Our approach employs a search strategy parametrized by a self-attention-based architecture.
We show that it is possible to self-referentially train an evolution strategy from scratch, with the learned update rule used to drive the outer meta-learning loop.
arXiv Detail & Related papers (2022-11-21T08:48:46Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning [61.662504399411695]
We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal.
When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
arXiv Detail & Related papers (2021-10-30T08:36:52Z) - Adaptive Learning Rate and Momentum for Training Deep Neural Networks [0.0]
We develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework.
Experiments in image classification datasets show that our method yields faster convergence than other local solvers.
arXiv Detail & Related papers (2021-06-22T05:06:56Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.