Learning to Locomote: Understanding How Environment Design Matters for
Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2010.04304v1
- Date: Fri, 9 Oct 2020 00:03:27 GMT
- Title: Learning to Locomote: Understanding How Environment Design Matters for
Deep Reinforcement Learning
- Authors: Daniele Reda, Tianxin Tao, Michiel van de Panne
- Abstract summary: We show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results.
Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits.
We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote.
- Score: 7.426118390008397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning to locomote is one of the most common tasks in physics-based
animation and deep reinforcement learning (RL). A learned policy is the product
of the problem to be solved, as embodied by the RL environment, and the RL
algorithm. While enormous attention has been devoted to RL algorithms, much
less is known about the impact of design choices for the RL environment. In
this paper, we show that environment design matters in significant ways and
document how it can contribute to the brittle nature of many RL results.
Specifically, we examine choices related to state representations, initial
state distributions, reward structure, control frequency, episode termination
procedures, curriculum usage, the action space, and the torque limits. We aim
to stimulate discussion around such choices, which in practice strongly impact
the success of RL when applied to continuous-action control problems of
interest to animation, such as learning to locomote.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Explaining RL Decisions with Trajectories [28.261758841898697]
Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems.
We propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training.
arXiv Detail & Related papers (2023-05-06T15:26:22Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Design Process is a Reinforcement Learning Problem [0.0]
We argue the design process is a reinforcement learning problem and can potentially be a proper application for RL algorithms.
This creates opportunities for using RL methods and, at the same time, raises challenges.
arXiv Detail & Related papers (2022-11-06T14:37:22Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - Automated Reinforcement Learning (AutoRL): A Survey and Open Problems [92.73407630874841]
Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL.
We provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.
arXiv Detail & Related papers (2022-01-11T12:41:43Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - A Validation Tool for Designing Reinforcement Learning Environments [0.0]
This study proposes a Markov-based feature analysis method to validate whether an MDP is well formulated.
We believe an MDP suitable for applying RL should contain a set of state features that are both sensitive to actions and predictive in rewards.
arXiv Detail & Related papers (2021-12-10T13:28:08Z) - Generalization in Deep RL for TSP Problems via Equivariance and Local
Search [21.07325126324399]
We propose a simple deep learning architecture that learns with novel RL training techniques.
We empirically evaluate our proposition on random and realistic TSP problems against relevant state-of-the-art deep RL methods.
arXiv Detail & Related papers (2021-10-07T16:20:37Z) - Heuristic-Guided Reinforcement Learning [31.056460162389783]
Tabula rasa RL algorithms require environment interactions or computation that scales with the horizon of the decision-making task.
Our framework can be viewed as a horizon-based regularization for controlling bias and variance in RL under a finite interaction budget.
In particular, we introduce the novel concept of an "improvable" -- a that allows an RL agent to extrapolate beyond its prior knowledge.
arXiv Detail & Related papers (2021-06-05T00:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.