The RL Perceptron: Generalisation Dynamics of Policy Learning in High
Dimensions
- URL: http://arxiv.org/abs/2306.10404v5
- Date: Sat, 2 Sep 2023 14:24:52 GMT
- Title: The RL Perceptron: Generalisation Dynamics of Policy Learning in High
Dimensions
- Authors: Nishil Patel, Sebastian Lee, Stefano Sarao Mannelli, Sebastian Goldt,
Andrew Saxe
- Abstract summary: Reinforcement learning algorithms have proven transformative in a range of domains.
Much theory of RL has focused on discrete state spaces or worst-case analysis.
We propose a solvable high-dimensional model of RL that can capture a variety of learning protocols.
- Score: 14.778024171498208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) algorithms have proven transformative in a range
of domains. To tackle real-world domains, these systems often use neural
networks to learn policies directly from pixels or other high-dimensional
sensory input. By contrast, much theory of RL has focused on discrete state
spaces or worst-case analysis, and fundamental questions remain about the
dynamics of policy learning in high-dimensional settings. Here, we propose a
solvable high-dimensional model of RL that can capture a variety of learning
protocols, and derive its typical dynamics as a set of closed-form ordinary
differential equations (ODEs). We derive optimal schedules for the learning
rates and task difficulty - analogous to annealing schemes and curricula during
training in RL - and show that the model exhibits rich behaviour, including
delayed learning under sparse rewards; a variety of learning regimes depending
on reward baselines; and a speed-accuracy trade-off driven by reward
stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade
Learning Environment game "Pong" also show such a speed-accuracy trade-off in
practice. Together, these results take a step towards closing the gap between
theory and practice in high-dimensional RL.
Related papers
- ODRL: A Benchmark for Off-Dynamics Reinforcement Learning [59.72217833812439]
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods.
ODRL contains four experimental settings where the source and target domains can be either online or offline.
We conduct extensive benchmarking experiments, which show that no method has universal advantages across varied dynamics shifts.
arXiv Detail & Related papers (2024-10-28T05:29:38Z) - Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Adaptive action supervision in reinforcement learning from real-world
multi-agent demonstrations [10.174009792409928]
We propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios.
In the experiments, using chase-and-escape and football tasks with the different dynamics between the unknown source and target environments, we show that our approach achieved a balance between the generalization and the generalization ability compared with the baselines.
arXiv Detail & Related papers (2023-05-22T13:33:37Z) - Entropy Regularized Reinforcement Learning with Cascading Networks [9.973226671536041]
Deep RL uses neural networks as function approximators.
One of the major difficulties of RL is the absence of i.i.d. data.
In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture.
arXiv Detail & Related papers (2022-10-16T10:28:59Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics.
We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z) - Catastrophic Interference in Reinforcement Learning: A Solution Based on
Context Division and Knowledge Distillation [8.044847478961882]
We introduce the concept of "context" into single-task reinforcement learning.
We develop a novel scheme, termed as Context Division and Knowledge Distillation driven RL.
Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms.
arXiv Detail & Related papers (2021-09-01T12:02:04Z) - Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization.
We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector.
Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - The Adversarial Resilience Learning Architecture for AI-based Modelling,
Exploration, and Operation of Complex Cyber-Physical Systems [0.0]
We describe the concept of Adversarial Learning (ARL) that formulates a new approach to complex environment checking and resilient operation.
The quintessence of ARL lies in both agents exploring the system and training each other without any domain knowledge.
Here, we introduce the ARL software architecture that allows to use a wide range of model-free as well as model-based DRL-based algorithms.
arXiv Detail & Related papers (2020-05-27T19:19:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.