PBCS : Efficient Exploration and Exploitation Using a Synergy between
Reinforcement Learning and Motion Planning
- URL: http://arxiv.org/abs/2004.11667v1
- Date: Fri, 24 Apr 2020 11:37:09 GMT
- Title: PBCS : Efficient Exploration and Exploitation Using a Synergy between
Reinforcement Learning and Motion Planning
- Authors: Guillaume Matheron, Nicolas Perrin, Olivier Sigaud
- Abstract summary: "Plan, Backplay, Chain Skills" combines motion planning and reinforcement learning to solve hard exploration environments.
We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes.
- Score: 8.176152440971897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The exploration-exploitation trade-off is at the heart of reinforcement
learning (RL). However, most continuous control benchmarks used in recent RL
research only require local exploration. This led to the development of
algorithms that have basic exploration capabilities, and behave poorly in
benchmarks that require more versatile exploration. For instance, as
demonstrated in our empirical study, state-of-the-art RL algorithms such as
DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this
paper, we propose a new algorithm called "Plan, Backplay, Chain Skills" (PBCS)
that combines motion planning and reinforcement learning to solve hard
exploration environments. In a first phase, a motion planning algorithm is used
to find a single good trajectory, then an RL algorithm is trained using a
curriculum derived from the trajectory, by combining a variant of the Backplay
algorithm and skill chaining. We show that this method outperforms
state-of-the-art RL algorithms in 2D maze environments of various sizes, and is
able to improve on the trajectory obtained by the motion planning phase.
Related papers
- Active search and coverage using point-cloud reinforcement learning [50.741409008225766]
This paper presents an end-to-end deep reinforcement learning solution for target search and coverage.
We show that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points.
We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome.
arXiv Detail & Related papers (2023-12-18T18:16:30Z) - Action and Trajectory Planning for Urban Autonomous Driving with
Hierarchical Reinforcement Learning [1.3397650653650457]
We propose an action and trajectory planner using Hierarchical Reinforcement Learning (atHRL) method.
We empirically verify the efficacy of atHRL through extensive experiments in complex urban driving scenarios.
arXiv Detail & Related papers (2023-06-28T07:11:02Z) - CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality [5.0915256711576475]
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
arXiv Detail & Related papers (2022-11-12T10:16:35Z) - Pretraining in Deep Reinforcement Learning: A Survey [17.38360092869849]
Pretraining has shown to be effective in acquiring transferable knowledge.
Due to the nature of reinforcement learning, pretraining in this field is faced with unique challenges.
arXiv Detail & Related papers (2022-11-08T02:17:54Z) - Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z) - Abstract Demonstrations and Adaptive Exploration for Efficient and
Stable Multi-step Sparse Reward Reinforcement Learning [44.968170318777105]
This paper proposes a DRL exploration technique, termed A2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration.
A2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn.
We demonstrate that A2 can aid popular DRL algorithms to learn more efficiently and stably in these environments.
arXiv Detail & Related papers (2022-07-19T12:56:41Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents.
Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods.
Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.
We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm.
The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z) - Accelerating Robotic Reinforcement Learning via Parameterized Action
Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems.
However, training RL agents to solve robotics tasks still remains challenging.
In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.
We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks.
We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space.
We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.