Reinforcement Learning with Supervision from Noisy Demonstrations
- URL: http://arxiv.org/abs/2006.07808v1
- Date: Sun, 14 Jun 2020 06:03:06 GMT
- Title: Reinforcement Learning with Supervision from Noisy Demonstrations
- Authors: Kun-Peng Ning and Sheng-Jun Huang
- Abstract summary: We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
- Score: 38.00968774243178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning has achieved great success in various applications. To
learn an effective policy for the agent, it usually requires a huge amount of
data by interacting with the environment, which could be computational costly
and time consuming. To overcome this challenge, the framework called
Reinforcement Learning with Expert Demonstrations (RLED) was proposed to
exploit the supervision from expert demonstrations. Although the RLED methods
can reduce the number of learning iterations, they usually assume the
demonstrations are perfect, and thus may be seriously misled by the noisy
demonstrations in real applications. In this paper, we propose a novel
framework to adaptively learn the policy by jointly interacting with the
environment and exploiting the expert demonstrations. Specifically, for each
step of the demonstration trajectory, we form an instance, and define a joint
loss function to simultaneously maximize the expected reward and minimize the
difference between agent behaviors and demonstrations. Most importantly, by
calculating the expected gain of the value function, we assign each instance
with a weight to estimate its potential utility, and thus can emphasize the
more helpful demonstrations while filter out noisy ones. Experimental results
in various environments with multiple popular reinforcement learning algorithms
show that the proposed approach can learn robustly with noisy demonstrations,
and achieve higher performance in fewer iterations.
Related papers
- "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - A Survey of Demonstration Learning [0.0]
Demonstration Learning is a paradigm in which an agent learns to perform a task by imitating the behavior of an expert shown in demonstrations.
It is gaining significant traction due to having tremendous potential for learning complex behaviors from demonstrations.
Due to learning without interacting with the environment, demonstration learning would allow the automation of a wide range of real world applications such as robotics and healthcare.
arXiv Detail & Related papers (2023-03-20T15:22:10Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Sample Efficient Imitation Learning via Reward Function Trained in
Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations.
In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Robust Maximum Entropy Behavior Cloning [15.713997170792842]
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task.
Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set?
We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set.
arXiv Detail & Related papers (2021-01-04T22:08:46Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.