Discovering Temporally-Aware Reinforcement Learning Algorithms
- URL: http://arxiv.org/abs/2402.05828v1
- Date: Thu, 8 Feb 2024 17:07:42 GMT
- Title: Discovering Temporally-Aware Reinforcement Learning Algorithms
- Authors: Matthew Thomas Jackson, Chris Lu, Louis Kirsch, Robert Tjarko Lange,
Shimon Whiteson, Jakob Nicolaus Foerster
- Abstract summary: We propose a simple augmentation to two existing objective discovery approaches.
We find that commonly used meta-gradient approaches fail to discover adaptive objective functions.
- Score: 42.016150906831776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in meta-learning have enabled the automatic discovery of
novel reinforcement learning algorithms parameterized by surrogate objective
functions. To improve upon manually designed algorithms, the parameterization
of this learned objective function must be expressive enough to represent novel
principles of learning (instead of merely recovering already established ones)
while still generalizing to a wide range of settings outside of its
meta-training distribution. However, existing methods focus on discovering
objective functions that, like many widely used objective functions in
reinforcement learning, do not take into account the total number of steps
allowed for training, or "training horizon". In contrast, humans use a plethora
of different learning objectives across the course of acquiring a new ability.
For instance, students may alter their studying techniques based on the
proximity to exam deadlines and their self-assessed capabilities. This paper
contends that ignoring the optimization time horizon significantly restricts
the expressive potential of discovered learning algorithms. We propose a simple
augmentation to two existing objective discovery approaches that allows the
discovered algorithm to dynamically update its objective function throughout
the agent's training procedure, resulting in expressive schedules and increased
generalization across different training horizons. In the process, we find that
commonly used meta-gradient approaches fail to discover such adaptive objective
functions while evolution strategies discover highly dynamic learning rules. We
demonstrate the effectiveness of our approach on a wide range of tasks and
analyze the resulting learned algorithms, which we find effectively balance
exploration and exploitation by modifying the structure of their learning rules
throughout the agent's lifetime.
Related papers
- Meta-Learning Neural Procedural Biases [9.876317838854018]
We propose Neural Procedural Bias Meta-Learning, a novel framework designed to meta-learn task procedural biases.
We show that by meta-learning the procedural biases of a neural network, we can induce strong inductive biases towards a distribution of learning tasks, enabling robust learning performance across many well-established few-shot learning benchmarks.
arXiv Detail & Related papers (2024-06-12T08:09:29Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Meta-Learning Strategies through Value Maximization in Neural Networks [7.285835869818669]
We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective.
We apply this framework to investigate the effect of approximations in common meta-learning algorithms.
Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning.
arXiv Detail & Related papers (2023-10-30T18:29:26Z) - Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z) - Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Incremental Object Detection via Meta-Learning [77.55310507917012]
We propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared.
In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to high-capacity models for object detection.
arXiv Detail & Related papers (2020-03-17T13:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.