Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations
- URL: http://arxiv.org/abs/2004.00530v1
- Date: Wed, 1 Apr 2020 15:57:15 GMT
- Title: Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations
- Authors: Zhuangdi Zhu, Kaixiang Lin, Bo Dai, and Jiayu Zhou
- Abstract summary: Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
- Score: 78.94386823185724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-free deep reinforcement learning (RL) has demonstrated its superiority
on many complex sequential decision-making problems. However, heavy dependence
on dense rewards and high sample-complexity impedes the wide adoption of these
methods in real-world scenarios. On the other hand, imitation learning (IL)
learns effectively in sparse-rewarded tasks by leveraging the existing expert
demonstrations. In practice, collecting a sufficient amount of expert
demonstrations can be prohibitively expensive, and the quality of
demonstrations typically limits the performance of the learning policy. In this
work, we propose Self-Adaptive Imitation Learning (SAIL) that can achieve
(near) optimal performance given only a limited number of sub-optimal
demonstrations for highly challenging sparse reward tasks. SAIL bridges the
advantages of IL and RL to reduce the sample complexity substantially, by
effectively exploiting sup-optimal demonstrations and efficiently exploring the
environment to surpass the demonstrated performance. Extensive empirical
results show that not only does SAIL significantly improve the
sample-efficiency but also leads to much better final performance across
different continuous control tasks, comparing to the state-of-the-art.
Related papers
- Efficient Diversity-based Experience Replay for Deep Reinforcement Learning [14.96744975805832]
This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations.
We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat.
arXiv Detail & Related papers (2024-10-27T15:51:27Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Enhancing Q-Learning with Large Language Model Heuristics [0.0]
Large language models (LLMs) can achieve zero-shot learning for simpler tasks, but they suffer from low inference speeds and occasional hallucinations.
We propose textbfLLM-guided Q-learning, a framework that leverages LLMs as hallucinations to aid in learning the Q-function for reinforcement learning.
arXiv Detail & Related papers (2024-05-06T10:42:28Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - D-Shape: Demonstration-Shaped Reinforcement Learning via Goal
Conditioning [48.57484755946714]
D-Shape is a new method for combining imitation learning (IL) and reinforcement learning (RL)
This paper introduces D-Shape, a new method for combining IL and RL that uses ideas from reward shaping and goal-conditioned RL to resolve the above conflict.
We experimentally validate D-Shape in sparse-reward gridworld domains, showing that it both improves over RL in terms of sample efficiency and converges consistently to the optimal policy.
arXiv Detail & Related papers (2022-10-26T02:28:32Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - GAN-Based Interactive Reinforcement Learning from Demonstration and
Human Evaluative Feedback [6.367592686247906]
We propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback.
We tested our proposed method in six physics-based control tasks.
arXiv Detail & Related papers (2021-04-14T02:58:51Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.