Meta-Gradient Reinforcement Learning with an Objective Discovered Online
- URL: http://arxiv.org/abs/2007.08433v1
- Date: Thu, 16 Jul 2020 16:17:09 GMT
- Title: Meta-Gradient Reinforcement Learning with an Objective Discovered Online
- Authors: Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder
Singh, David Silver
- Abstract summary: We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
- Score: 54.15180335046361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning includes a broad family of algorithms that
parameterise an internal representation, such as a value function or policy, by
a deep neural network. Each algorithm optimises its parameters with respect to
an objective, such as Q-learning or policy gradient, that defines its
semantics. In this work, we propose an algorithm based on meta-gradient descent
that discovers its own objective, flexibly parameterised by a deep neural
network, solely from interactive experience with its environment. Over time,
this allows the agent to learn how to learn increasingly effectively.
Furthermore, because the objective is discovered online, it can adapt to
changes over time. We demonstrate that the algorithm discovers how to address
several important issues in RL, such as bootstrapping, non-stationarity, and
off-policy learning. On the Atari Learning Environment, the meta-gradient
algorithm adapts over time to learn with greater efficiency, eventually
outperforming the median score of a strong actor-critic baseline.
Related papers
- Discovering Temporally-Aware Reinforcement Learning Algorithms [42.016150906831776]
We propose a simple augmentation to two existing objective discovery approaches.
We find that commonly used meta-gradient approaches fail to discover adaptive objective functions.
arXiv Detail & Related papers (2024-02-08T17:07:42Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Bootstrapped Meta-Learning [48.017607959109924]
We propose an algorithm that tackles a challenging meta-optimisation problem by letting the meta-learner teach itself.
The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric.
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities.
arXiv Detail & Related papers (2021-09-09T18:29:05Z) - A contrastive rule for meta-learning [1.3124513975412255]
Meta-learning algorithms leverage regularities that are present on a set of tasks to speed up and improve the performance of a subsidiary learning process.
We present a gradient-based meta-learning algorithm based on equilibrium propagation.
We establish theoretical bounds on its performance and present experiments on a set of standard benchmarks and neural network architectures.
arXiv Detail & Related papers (2021-04-04T19:45:41Z) - Online Structured Meta-learning [137.48138166279313]
Current online meta-learning algorithms are limited to learn a globally-shared meta-learner.
We propose an online structured meta-learning (OSML) framework to overcome this limitation.
Experiments on three datasets demonstrate the effectiveness and interpretability of our proposed framework.
arXiv Detail & Related papers (2020-10-22T09:10:31Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - GRAC: Self-Guided and Self-Regularized Actor-Critic [24.268453994605512]
We propose a self-regularized TD-learning method to address divergence without requiring a target network.
We also propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization.
This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network.
We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
arXiv Detail & Related papers (2020-09-18T17:58:29Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems [5.23587935428994]
In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP.
The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms.
arXiv Detail & Related papers (2020-04-27T14:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.