One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning
- URL: http://arxiv.org/abs/2111.00206v1
- Date: Sat, 30 Oct 2021 08:36:52 GMT
- Title: One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient
Reinforcement Learning
- Authors: Cl\'ement Bonnet, Paul Caron, Thomas Barrett, Ian Davies, Alexandre
Laterre
- Abstract summary: We introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal.
When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving similar or higher performance.
- Score: 61.662504399411695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-tuning algorithms that adapt the learning process online encourage more
effective and robust learning. Among all the methods available, meta-gradients
have emerged as a promising approach. They leverage the differentiability of
the learning rule with respect to some hyper-parameters to adapt them in an
online fashion. Although meta-gradients can be accumulated over multiple
learning steps to avoid myopic updates, this is rarely used in practice. In
this work, we demonstrate that whilst multi-step meta-gradients do provide a
better learning signal in expectation, this comes at the cost of a significant
increase in variance, hindering performance. In the light of this analysis, we
introduce a novel method mixing multiple inner steps that enjoys a more
accurate and robust meta-gradient signal, essentially trading off bias and
variance in meta-gradient estimation. When applied to the Snake game, the
mixing meta-gradient algorithm can cut the variance by a factor of 3 while
achieving similar or higher performance.
Related papers
- Classifier-guided Gradient Modulation for Enhanced Multimodal Learning [50.7008456698935]
Gradient-Guided Modulation (CGGM) is a novel method to balance multimodal learning with gradients.
We conduct extensive experiments on four multimodal datasets: UPMC-Food 101, CMU-MOSI, IEMOCAP and BraTS.
CGGM outperforms all the baselines and other state-of-the-art methods consistently.
arXiv Detail & Related papers (2024-11-03T02:38:43Z) - Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach [38.76462300149459]
We develop a Multi-objective Correction (MoCo) method for multi-objective gradient optimization.
The unique feature of our method is that it can guarantee convergence without increasing the non fairness gradient.
arXiv Detail & Related papers (2022-10-23T05:54:26Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Efficient Meta-Learning for Continual Learning with Taylor Expansion
Approximation [2.28438857884398]
Continual learning aims to alleviate catastrophic forgetting when handling consecutive tasks under non-stationary distributions.
We propose a novel efficient meta-learning algorithm for solving the online continual learning problem.
Our method achieves better or on-par performance and much higher efficiency compared to the state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-03T04:57:05Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation.
We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z) - A contrastive rule for meta-learning [1.3124513975412255]
Meta-learning algorithms leverage regularities that are present on a set of tasks to speed up and improve the performance of a subsidiary learning process.
We present a gradient-based meta-learning algorithm based on equilibrium propagation.
We establish theoretical bounds on its performance and present experiments on a set of standard benchmarks and neural network architectures.
arXiv Detail & Related papers (2021-04-04T19:45:41Z) - Large-Scale Meta-Learning with Continual Trajectory Shifting [76.29017270864308]
We show that allowing the meta-learners to take a larger number of inner gradient steps better captures the structure of heterogeneous and large-scale tasks.
In order to increase the frequency of meta-updates, we propose to estimate the required shift of the task-specific parameters.
We show that the algorithm largely outperforms the previous first-order meta-learning methods in terms of both generalization performance and convergence.
arXiv Detail & Related papers (2021-02-14T18:36:33Z) - Multi-step Estimation for Gradient-based Meta-learning [3.4376560669160385]
We propose a simple yet straightforward method to reduce the cost by reusing the same gradient in a window of inner steps.
We show that our method significantly reduces training time and memory usage, maintaining competitive accuracies, or even outperforming in some cases.
arXiv Detail & Related papers (2020-06-08T00:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.