Related papers: What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

URL: http://arxiv.org/abs/2006.05990v1
Date: Wed, 10 Jun 2020 17:59:03 GMT
Title: What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Authors: Marcin Andrychowicz, Anton Raichuk, Piotr Sta\'nczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, L\'eonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem
Abstract summary: On-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. But state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. These choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. We implement >50 such choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study.
Score: 50.79125250286453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

Related papers

A Survey of Continual Reinforcement Learning [37.12149196139624]
Reinforcement Learning (RL) is an important machine learning paradigm for solving sequential decision-making problems.<n>RL's limited ability to generalize across tasks restricts its applicability in dynamic and real-world environments.<n>Continual Reinforcement Learning (CRL) has emerged as a promising research direction to address these limitations.
arXiv Detail & Related papers (2025-06-27T03:10:20Z)
A Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical Challenges [2.2448567386846916]
Reinforcement Learning (RL) has emerged as a powerful paradigm in Artificial Intelligence (AI) This paper presents a comprehensive survey of RL, meticulously analyzing a wide range of algorithms. We offer practical insights into the selection and implementation of RL algorithms, addressing common challenges like convergence, stability, and the exploration-exploitation dilemma.
arXiv Detail & Related papers (2024-11-28T03:53:14Z)
Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL) Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies. Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z)
Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks. We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z)
Counterfactual Explanation Policies in RL [3.674863913115432]
COUNTERPOL is the first framework to analyze Reinforcement Learning policies using counterfactual explanations. We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL.
arXiv Detail & Related papers (2023-07-25T01:14:56Z)
A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)
When does return-conditioned supervised learning work for offline reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning. We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Deep RL With Information Constrained Policies: Generalization in Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks. We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm. Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.