What Matters In On-Policy Reinforcement Learning? A Large-Scale
Empirical Study
- URL: http://arxiv.org/abs/2006.05990v1
- Date: Wed, 10 Jun 2020 17:59:03 GMT
- Title: What Matters In On-Policy Reinforcement Learning? A Large-Scale
Empirical Study
- Authors: Marcin Andrychowicz, Anton Raichuk, Piotr Sta\'nczyk, Manu Orsini,
Sertan Girgin, Raphael Marinier, L\'eonard Hussenot, Matthieu Geist, Olivier
Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem
- Abstract summary: On-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks.
But state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents.
These choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations.
We implement >50 such choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study.
- Score: 50.79125250286453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, on-policy reinforcement learning (RL) has been successfully
applied to many different continuous control tasks. While RL algorithms are
often conceptually simple, their state-of-the-art implementations take numerous
low- and high-level design decisions that strongly affect the performance of
the resulting agents. Those choices are usually not extensively discussed in
the literature, leading to discrepancy between published descriptions of
algorithms and their implementations. This makes it hard to attribute progress
in RL and slows down overall progress [Engstrom'20]. As a step towards filling
that gap, we implement >50 such ``choices'' in a unified on-policy RL
framework, allowing us to investigate their impact in a large-scale empirical
study. We train over 250'000 agents in five continuous control environments of
different complexity and provide insights and practical recommendations for
on-policy training of RL agents.
Related papers
- Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL)
Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies.
Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z) - Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks.
We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent.
Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z) - Counterfactual Explanation Policies in RL [3.674863913115432]
COUNTERPOL is the first framework to analyze Reinforcement Learning policies using counterfactual explanations.
We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL.
arXiv Detail & Related papers (2023-07-25T01:14:56Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - When does return-conditioned supervised learning work for offline
reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning.
We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.