Policy Gradient from Demonstration and Curiosity
- URL: http://arxiv.org/abs/2004.10430v2
- Date: Tue, 9 Jun 2020 10:57:48 GMT
- Title: Policy Gradient from Demonstration and Curiosity
- Authors: Jie Chen, Wenjun Xu
- Abstract summary: In this work, an integrated policy gradient algorithm was proposed to boost exploration and facilitate intrinsic reward learning.
The presented algorithm was evaluated on a range of simulated tasks with sparse extrinsic reward signals.
It was found that the agent could imitate the expert's behavior and meanwhile sustain high return.
- Score: 9.69620214666782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With reinforcement learning, an agent could learn complex behaviors from
high-level abstractions of the task. However, exploration and reward shaping
remained challenging for existing methods, especially in scenarios where the
extrinsic feedback was sparse. Expert demonstrations have been investigated to
solve these difficulties, but a tremendous number of high-quality
demonstrations were usually required. In this work, an integrated policy
gradient algorithm was proposed to boost exploration and facilitate intrinsic
reward learning from only limited number of demonstrations. We achieved this by
reformulating the original reward function with two additional terms, where the
first term measured the Jensen-Shannon divergence between current policy and
the expert, and the second term estimated the agent's uncertainty about the
environment. The presented algorithm was evaluated on a range of simulated
tasks with sparse extrinsic reward signals where only one single demonstrated
trajectory was provided to each task, superior exploration efficiency and high
average return were demonstrated in all tasks. Furthermore, it was found that
the agent could imitate the expert's behavior and meanwhile sustain high
return.
Related papers
- RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Never Explore Repeatedly in Multi-Agent Reinforcement Learning [40.35950679063337]
We propose a dynamic reward scaling approach to combat "revisitation"
We show enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2023-08-19T05:27:48Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Guarantees for Epsilon-Greedy Reinforcement Learning with Function
Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks.
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z) - Dealing with Sparse Rewards in Continuous Control Robotics via
Heavy-Tailed Policies [64.2210390071609]
We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.
We show consistent performance improvement across all tasks in terms of high average cumulative reward.
arXiv Detail & Related papers (2022-06-12T04:09:39Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.