How to Make Deep RL Work in Practice
- URL: http://arxiv.org/abs/2010.13083v2
- Date: Tue, 10 Nov 2020 12:46:20 GMT
- Title: How to Make Deep RL Work in Practice
- Authors: Nirnai Rao, Elie Aljalbout, Axel Sauer, Sami Haddadin
- Abstract summary: Reported results of state-of-the-art algorithms are often difficult to reproduce.
We make suggestions which of those techniques to use by default and highlight areas that could benefit from a solution specifically tailored to RL.
- Score: 15.740760669623876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, challenging control problems became solvable with deep
reinforcement learning (RL). To be able to use RL for large-scale real-world
applications, a certain degree of reliability in their performance is
necessary. Reported results of state-of-the-art algorithms are often difficult
to reproduce. One reason for this is that certain implementation details
influence the performance significantly. Commonly, these details are not
highlighted as important techniques to achieve state-of-the-art performance.
Additionally, techniques from supervised learning are often used by default but
influence the algorithms in a reinforcement learning setting in different and
not well-understood ways. In this paper, we investigate the influence of
certain initialization, input normalization, and adaptive learning techniques
on the performance of state-of-the-art RL algorithms. We make suggestions which
of those techniques to use by default and highlight areas that could benefit
from a solution specifically tailored to RL.
Related papers
- Offline reinforcement learning for job-shop scheduling problems [1.3927943269211593]
This paper introduces a novel offline RL method designed for optimization problems with complex constraints.
Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions.
We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks.
arXiv Detail & Related papers (2024-10-21T07:33:42Z) - Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks.
We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent.
Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z) - Decoupled Prioritized Resampling for Offline RL [120.49021589395005]
We propose Offline Prioritized Experience Replay (OPER) for offline reinforcement learning.
OPER features a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training.
We show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution.
arXiv Detail & Related papers (2023-06-08T17:56:46Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Large Language Models can Implement Policy Iteration [18.424558160071808]
In-Context Policy Iteration is an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models.
ICPI learns to perform RL tasks without expert demonstrations or gradients.
ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment.
arXiv Detail & Related papers (2022-10-07T21:18:22Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? [15.578423102700764]
We propose an online feature extractor network (OFENet) that uses neural nets to produce good representations to be used as inputs to deep RL algorithms.
We show that the RL agents learn more efficiently with the high-dimensional representation than with the lower-dimensional state observations.
arXiv Detail & Related papers (2020-03-03T16:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.