Learning from Demonstration without Demonstrations
- URL: http://arxiv.org/abs/2106.09203v1
- Date: Thu, 17 Jun 2021 01:57:08 GMT
- Title: Learning from Demonstration without Demonstrations
- Authors: Tom Blau, Gilad Francis, Philippe Morere
- Abstract summary: We propose Probabilistic Planning for Demonstration Discovery (P2D2), a technique for automatically discovering demonstrations without access to an expert.
We formulate discovering demonstrations as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree to find demonstration trajectories.
We experimentally demonstrate the method outperforms classic and intrinsic exploration RL techniques in a range of classic control and robotics tasks.
- Score: 5.027571997864707
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art reinforcement learning (RL) algorithms suffer from high
sample complexity, particularly in the sparse reward case. A popular strategy
for mitigating this problem is to learn control policies by imitating a set of
expert demonstrations. The drawback of such approaches is that an expert needs
to produce demonstrations, which may be costly in practice. To address this
shortcoming, we propose Probabilistic Planning for Demonstration Discovery
(P2D2), a technique for automatically discovering demonstrations without access
to an expert. We formulate discovering demonstrations as a search problem and
leverage widely-used planning algorithms such as Rapidly-exploring Random Tree
to find demonstration trajectories. These demonstrations are used to initialize
a policy, then refined by a generic RL algorithm. We provide theoretical
guarantees of P2D2 finding successful trajectories, as well as bounds for its
sampling complexity. We experimentally demonstrate the method outperforms
classic and intrinsic exploration RL techniques in a range of classic control
and robotics tasks, requiring only a fraction of exploration samples and
achieving better asymptotic performance.
Related papers
- Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Self-Imitation Learning from Demonstrations [4.907551775445731]
Self-Imitation Learning exploits agent's past good experience to learn from suboptimal demonstrations.
We show that SILfD can learn from demonstrations that are noisy or far from optimal.
We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments.
arXiv Detail & Related papers (2022-03-21T11:56:56Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - Provably Efficient Exploration for Reinforcement Learning Using
Unsupervised Learning [96.78504087416654]
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient.
We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
arXiv Detail & Related papers (2020-03-15T19:23:59Z) - Reinforcement Learning with Probabilistically Complete Exploration [27.785017885906313]
We propose Rapidly Randomly-exploring Reinforcement Learning (R3L)
We formulate exploration as a search problem and leverage widely-used planning algorithms to find initial solutions.
We experimentally demonstrate the method, requiring only a fraction of exploration samples and achieving better performance.
arXiv Detail & Related papers (2020-01-20T02:11:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.