Improving the Exploration of Deep Reinforcement Learning in Continuous
Domains using Planning for Policy Search
- URL: http://arxiv.org/abs/2010.12974v1
- Date: Sat, 24 Oct 2020 20:19:06 GMT
- Title: Improving the Exploration of Deep Reinforcement Learning in Continuous
Domains using Planning for Policy Search
- Authors: Jakob J. Hollenstein, Erwan Renaudo, Matteo Saveriano, Justus Piater
- Abstract summary: We propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from generated environment interactions.
We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems.
This generates training data that helps PPS discover better policies.
- Score: 6.088695984060244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local policy search is performed by most Deep Reinforcement Learning (D-RL)
methods, which increases the risk of getting trapped in a local minimum.
Furthermore, the availability of a simulation model is not fully exploited in
D-RL even in simulation-based training, which potentially decreases efficiency.
To better exploit simulation models in policy search, we propose to integrate a
kinodynamic planner in the exploration strategy and to learn a control policy
in an offline fashion from the generated environment interactions. We call the
resulting model-based reinforcement learning method PPS (Planning for Policy
Search). We compare PPS with state-of-the-art D-RL methods in typical RL
settings including underactuated systems. The comparison shows that PPS, guided
by the kinodynamic planner, collects data from a wider region of the state
space. This generates training data that helps PPS discover better policies.
Related papers
- Theoretically Guaranteed Policy Improvement Distilled from Model-Based
Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks.
Recent practices tend to distill optimized action sequences into an RL policy during the training phase.
We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL)
for comfortable and safe autonomous driving [7.3045725197814875]
This paper presents a Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) model for maneuver planning.
By learning from its experience, a Reinforcement Learning (RL)-based driving agent can adapt to changing driving conditions.
The results clearly show that PMP-DRL can handle complex real-world scenarios and make better comfortable and safe maneuver decisions than rule-based and imitative imitative.
arXiv Detail & Related papers (2023-06-15T11:27:30Z) - Diverse Policy Optimization for Structured Action Space [59.361076277997704]
We propose Diverse Policy Optimization (DPO) to model the policies in structured action space as the energy-based models (EBM)
A novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler.
Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies.
arXiv Detail & Related papers (2023-02-23T10:48:09Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Learning Off-Policy with Online Planning [18.63424441772675]
We investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function.
We show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments.
arXiv Detail & Related papers (2020-08-23T16:18:44Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Population-Guided Parallel Policy Search for Reinforcement Learning [17.360163137926]
A new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL)
In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information.
arXiv Detail & Related papers (2020-01-09T10:13:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.