Self-Imitation Learning from Demonstrations
- URL: http://arxiv.org/abs/2203.10905v1
- Date: Mon, 21 Mar 2022 11:56:56 GMT
- Title: Self-Imitation Learning from Demonstrations
- Authors: Georgiy Pshikhachev, Dmitry Ivanov, Vladimir Egorov, Aleksei Shpilman
- Abstract summary: Self-Imitation Learning exploits agent's past good experience to learn from suboptimal demonstrations.
We show that SILfD can learn from demonstrations that are noisy or far from optimal.
We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments.
- Score: 4.907551775445731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the numerous breakthroughs achieved with Reinforcement Learning (RL),
solving environments with sparse rewards remains a challenging task that
requires sophisticated exploration. Learning from Demonstrations (LfD) remedies
this issue by guiding the agent's exploration towards states experienced by an
expert. Naturally, the benefits of this approach hinge on the quality of
demonstrations, which are rarely optimal in realistic scenarios. Modern LfD
algorithms require meticulous tuning of hyperparameters that control the
influence of demonstrations and, as we show in the paper, struggle with
learning from suboptimal demonstrations. To address these issues, we extend
Self-Imitation Learning (SIL), a recent RL algorithm that exploits the agent's
past good experience, to the LfD setup by initializing its replay buffer with
demonstrations. We denote our algorithm as SIL from Demonstrations (SILfD). We
empirically show that SILfD can learn from demonstrations that are noisy or far
from optimal and can automatically adjust the influence of demonstrations
throughout the training without additional hyperparameters or handcrafted
schedules. We also find SILfD superior to the existing state-of-the-art LfD
algorithms in sparse environments, especially when demonstrations are highly
suboptimal.
Related papers
- Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Dr.ICL: Demonstration-Retrieved In-context Learning [29.142262267850704]
In-context learning (ICL) teaching a large language model to perform a task with few-shot demonstrations has emerged as a strong paradigm for using LLMs.
Recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance.
This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations.
arXiv Detail & Related papers (2023-05-23T14:55:25Z) - Learning Complicated Manipulation Skills via Deterministic Policy with
Limited Demonstrations [9.640594614636049]
Deep reinforcement learning can efficiently develop policies for manipulators.
It takes time to collect sufficient high-quality demonstrations in practice.
Human demonstrations may be unsuitable for robots.
arXiv Detail & Related papers (2023-03-29T05:56:44Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Improving Learning from Demonstrations by Learning from Experience [4.605233477425785]
We propose a new algorithm named TD3fG that can smoothly transition from learning from experts to learning from experience.
Our algorithm achieves good performance in the MUJOCO environment with limited and sub-optimal demonstrations.
arXiv Detail & Related papers (2021-11-16T00:40:31Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.