Improving Learning from Demonstrations by Learning from Experience
- URL: http://arxiv.org/abs/2111.08156v1
- Date: Tue, 16 Nov 2021 00:40:31 GMT
- Title: Improving Learning from Demonstrations by Learning from Experience
- Authors: Haofeng Liu, Yiwen Chen, Jiayi Tan, Marcelo H Ang Jr
- Abstract summary: We propose a new algorithm named TD3fG that can smoothly transition from learning from experts to learning from experience.
Our algorithm achieves good performance in the MUJOCO environment with limited and sub-optimal demonstrations.
- Score: 4.605233477425785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How to make imitation learning more general when demonstrations are
relatively limited has been a persistent problem in reinforcement learning
(RL). Poor demonstrations lead to narrow and biased date distribution,
non-Markovian human expert demonstration makes it difficult for the agent to
learn, and over-reliance on sub-optimal trajectories can make it hard for the
agent to improve its performance. To solve these problems we propose a new
algorithm named TD3fG that can smoothly transition from learning from experts
to learning from experience. Our algorithm achieves good performance in the
MUJOCO environment with limited and sub-optimal demonstrations. We use behavior
cloning to train the network as a reference action generator and utilize it in
terms of both loss function and exploration noise. This innovation can help
agents extract a priori knowledge from demonstrations while reducing the
detrimental effects of the poor Markovian properties of the demonstrations. It
has a better performance compared to the BC+ fine-tuning and DDPGfD approach,
especially when the demonstrations are relatively limited. We call our method
TD3fG meaning TD3 from a generator.
Related papers
- "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Learning Complicated Manipulation Skills via Deterministic Policy with
Limited Demonstrations [9.640594614636049]
Deep reinforcement learning can efficiently develop policies for manipulators.
It takes time to collect sufficient high-quality demonstrations in practice.
Human demonstrations may be unsuitable for robots.
arXiv Detail & Related papers (2023-03-29T05:56:44Z) - Leveraging Demonstrations to Improve Online Learning: Quality Matters [54.98983862640944]
We show that the degree of improvement must depend on the quality of the demonstration data.
We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule.
arXiv Detail & Related papers (2023-02-07T08:49:12Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - Self-Imitation Learning from Demonstrations [4.907551775445731]
Self-Imitation Learning exploits agent's past good experience to learn from suboptimal demonstrations.
We show that SILfD can learn from demonstrations that are noisy or far from optimal.
We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments.
arXiv Detail & Related papers (2022-03-21T11:56:56Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z) - Efficiently Guiding Imitation Learning Agents with Human Gaze [28.7222865388462]
We use gaze cues from human demonstrators to enhance the performance of agents trained via three popular imitation learning methods.
Based on similarities between the attention of reinforcement learning agents and human gaze, we propose a novel approach for utilizing gaze data in a computationally efficient manner.
Our proposed approach improves the performance by 95% for BC, 343% for BCO, and 390% for T-REX, averaged over 20 different Atari games.
arXiv Detail & Related papers (2020-02-28T00:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.