Deep Black-Box Reinforcement Learning with Movement Primitives
- URL: http://arxiv.org/abs/2210.09622v1
- Date: Tue, 18 Oct 2022 06:34:52 GMT
- Title: Deep Black-Box Reinforcement Learning with Movement Primitives
- Authors: Fabian Otto, Onur Celik, Hongyi Zhou, Hanna Ziesche, Ngo Anh Vien,
Gerhard Neumann
- Abstract summary: We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
- Score: 15.184283143878488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: \Episode-based reinforcement learning (ERL) algorithms treat reinforcement
learning (RL) as a black-box optimization problem where we learn to select a
parameter vector of a controller, often represented as a movement primitive,
for a given task descriptor called a context. ERL offers several distinct
benefits in comparison to step-based RL. It generates smooth control
trajectories, can handle non-Markovian reward definitions, and the resulting
exploration in parameter space is well suited for solving sparse reward
settings. Yet, the high dimensionality of the movement primitive parameters has
so far hampered the effective use of deep RL methods. In this paper, we present
a new algorithm for deep ERL. It is based on differentiable trust region
layers, a successful on-policy deep RL algorithm. These layers allow us to
specify trust regions for the policy update that are solved exactly for each
state using convex optimization, which enables policies learning with the high
precision required for the ERL. We compare our ERL algorithm to
state-of-the-art step-based algorithms in many complex simulated robotic
control tasks. In doing so, we investigate different reward formulations -
dense, sparse, and non-Markovian. While step-based algorithms perform well only
on dense rewards, ERL performs favorably on sparse and non-Markovian rewards.
Moreover, our results show that the sparse and the non-Markovian rewards are
also often better suited to define the desired behavior, allowing us to obtain
considerably higher quality policies compared to step-based RL.
Related papers
- REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - How does Your RL Agent Explore? An Optimal Transport Analysis of Occupancy Measure Trajectories [8.429001045596687]
We represent the learning process of an RL algorithm as a sequence of policies generated during training.
We then study the policy trajectory induced in the manifold of state-action occupancy measures.
arXiv Detail & Related papers (2024-02-14T11:55:50Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Controlled Deep Reinforcement Learning for Optimized Slice Placement [0.8459686722437155]
We present a hybrid ML-heuristic approach that we name "Heuristically Assisted Deep Reinforcement Learning (HA-DRL)"
The proposed approach leverages recent works on Deep Reinforcement Learning (DRL) for slice placement and Virtual Network Embedding (VNE)
The evaluation results show that the proposed HA-DRL algorithm can accelerate the learning of an efficient slice placement policy.
arXiv Detail & Related papers (2021-08-03T14:54:00Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Regret Minimization Experience Replay [14.233842517210437]
prioritized sampling is a promising technique to improve the performance of RL agents.
In this work, we analyze the optimal prioritization strategy that can minimize the regret of RL policy theoretically.
We propose two practical algorithms, RM-DisCor and RM-TCE.
arXiv Detail & Related papers (2021-05-15T16:08:45Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.