Related papers: Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

URL: http://arxiv.org/abs/2303.16469v1
Date: Wed, 29 Mar 2023 05:56:44 GMT
Title: Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations
Authors: Liu Haofeng, Chen Yiwen, Tan Jiayi, Marcelo H Ang
Abstract summary: Deep reinforcement learning can efficiently develop policies for manipulators. It takes time to collect sufficient high-quality demonstrations in practice. Human demonstrations may be unsuitable for robots.
Score: 9.640594614636049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Combined with demonstrations, deep reinforcement learning can efficiently develop policies for manipulators. However, it takes time to collect sufficient high-quality demonstrations in practice. And human demonstrations may be unsuitable for robots. The non-Markovian process and over-reliance on demonstrations are further challenges. For example, we found that RL agents are sensitive to demonstration quality in manipulation tasks and struggle to adapt to demonstrations directly from humans. Thus it is challenging to leverage low-quality and insufficient demonstrations to assist reinforcement learning in training better policies, and sometimes, limited demonstrations even lead to worse performance. We propose a new algorithm named TD3fG (TD3 learning from a generator) to solve these problems. It forms a smooth transition from learning from experts to learning from experience. This innovation can help agents extract prior knowledge while reducing the detrimental effects of the demonstrations. Our algorithm performs well in Adroit manipulator and MuJoCo tasks with limited demonstrations.

Related papers

RoboCLIP: One Demonstration is Enough to Learn Robot Policies [72.24495908759967]
RoboCLIP is an online imitation learning method that uses a single demonstration in the form of a video demonstration or a textual description of the task to generate rewards. RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains.
arXiv Detail & Related papers (2023-10-11T21:10:21Z)
Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z)
A Survey of Demonstration Learning [0.0]
Demonstration Learning is a paradigm in which an agent learns to perform a task by imitating the behavior of an expert shown in demonstrations. It is gaining significant traction due to having tremendous potential for learning complex behaviors from demonstrations. Due to learning without interacting with the environment, demonstration learning would allow the automation of a wide range of real world applications such as robotics and healthcare.
arXiv Detail & Related papers (2023-03-20T15:22:10Z)
Cross-Domain Transfer via Semantic Skill Imitation [49.83150463391275]
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "turning on the stove"
arXiv Detail & Related papers (2022-12-14T18:46:14Z)
Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations [19.257876507104868]
Learning agile skills is one of the main challenges in robotics. We propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations. We show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors.
arXiv Detail & Related papers (2022-06-23T13:34:11Z)
Self-Imitation Learning from Demonstrations [4.907551775445731]
Self-Imitation Learning exploits agent's past good experience to learn from suboptimal demonstrations. We show that SILfD can learn from demonstrations that are noisy or far from optimal. We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments.
arXiv Detail & Related papers (2022-03-21T11:56:56Z)
Improving Learning from Demonstrations by Learning from Experience [4.605233477425785]
We propose a new algorithm named TD3fG that can smoothly transition from learning from experts to learning from experience. Our algorithm achieves good performance in the MUJOCO environment with limited and sub-optimal demonstrations.
arXiv Detail & Related papers (2021-11-16T00:40:31Z)
Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z)
Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning. Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning. We train an inverse dynamics model and use it to predict actions for state-only demonstrations. Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.