Related papers: Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

URL: http://arxiv.org/abs/2007.08433v1
Date: Thu, 16 Jul 2020 16:17:09 GMT
Title: Meta-Gradient Reinforcement Learning with an Objective Discovered Online
Authors: Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver
Abstract summary: We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network. Because the objective is discovered online, it can adapt to changes over time. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
Score: 54.15180335046361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time. We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency, eventually outperforming the median score of a strong actor-critic baseline.

Related papers

How Should We Meta-Learn Reinforcement Learning Algorithms? [74.37180723338591]
We carry out an empirical comparison of the different approaches when applied to a range of meta-learned algorithms.<n>In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time.<n>We propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.
arXiv Detail & Related papers (2025-07-23T16:31:38Z)
Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning [2.5352713493505785]
Reinforcement learning -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.
arXiv Detail & Related papers (2025-04-02T08:15:16Z)
Discovering Temporally-Aware Reinforcement Learning Algorithms [42.016150906831776]
We propose a simple augmentation to two existing objective discovery approaches. We find that commonly used meta-gradient approaches fail to discover adaptive objective functions.
arXiv Detail & Related papers (2024-02-08T17:07:42Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Bootstrapped Meta-Learning [48.017607959109924]
We propose an algorithm that tackles a challenging meta-optimisation problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities.
arXiv Detail & Related papers (2021-09-09T18:29:05Z)
A contrastive rule for meta-learning [1.3124513975412255]
Meta-learning algorithms leverage regularities that are present on a set of tasks to speed up and improve the performance of a subsidiary learning process. We present a gradient-based meta-learning algorithm based on equilibrium propagation. We establish theoretical bounds on its performance and present experiments on a set of standard benchmarks and neural network architectures.
arXiv Detail & Related papers (2021-04-04T19:45:41Z)
Online Structured Meta-learning [137.48138166279313]
Current online meta-learning algorithms are limited to learn a globally-shared meta-learner. We propose an online structured meta-learning (OSML) framework to overcome this limitation. Experiments on three datasets demonstrate the effectiveness and interpretability of our proposed framework.
arXiv Detail & Related papers (2020-10-22T09:10:31Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
GRAC: Self-Guided and Self-Regularized Actor-Critic [24.268453994605512]
We propose a self-regularized TD-learning method to address divergence without requiring a target network. We also propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
arXiv Detail & Related papers (2020-09-18T17:58:29Z)
Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules. This paper introduces a new meta-learning approach that discovers an entire update rule. It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z)
Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems [5.23587935428994]
In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms.
arXiv Detail & Related papers (2020-04-27T14:55:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.