Towards Learning to Imitate from a Single Video Demonstration
- URL: http://arxiv.org/abs/1901.07186v4
- Date: Wed, 12 Jul 2023 19:04:18 GMT
- Title: Towards Learning to Imitate from a Single Video Demonstration
- Authors: Glen Berseth, Florian Golemo, Christopher Pal
- Abstract summary: We develop a reinforcement learning agent that can learn to imitate given video observation.
We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips.
We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
- Score: 11.15358253586118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents that can learn to imitate given video observation -- \emph{without
direct access to state or action information} are more applicable to learning
in the natural world. However, formulating a reinforcement learning (RL) agent
that facilitates this goal remains a significant challenge. We approach this
challenge using contrastive training to learn a reward function comparing an
agent's behaviour with a single demonstration. We use a Siamese recurrent
neural network architecture to learn rewards in space and time between motion
clips while training an RL policy to minimize this distance. Through
experimentation, we also find that the inclusion of multi-task data and
additional image encoding losses improve the temporal consistency of the
learned rewards and, as a result, significantly improves policy learning. We
demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D
and a quadruped and a humanoid in 3D. We show that our method outperforms
current state-of-the-art techniques in these environments and can learn to
imitate from a single video demonstration.
Related papers
- MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - Improving Multimodal Interactive Agents with Reinforcement Learning from
Human Feedback [16.268581985382433]
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback.
Here we demonstrate how to use reinforcement learning from human feedback to improve upon simulated, embodied agents.
arXiv Detail & Related papers (2022-11-21T16:00:31Z) - Minimizing Human Assistance: Augmenting a Single Demonstration for Deep
Reinforcement Learning [0.0]
We use a single human example collected through a simple-to-use virtual reality simulation to assist with RL training.
Our method augments a single demonstration to generate numerous human-like demonstrations.
Despite learning from a human example, the agent is not constrained to human-level performance.
arXiv Detail & Related papers (2022-09-22T19:04:43Z) - Video2Skill: Adapting Events in Demonstration Videos to Skills in an
Environment using Cyclic MDP Homomorphisms [16.939129935919325]
Video2Skill (V2S) attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos.
We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations.
We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data.
arXiv Detail & Related papers (2021-09-08T17:59:01Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Learning to Run with Potential-Based Reward Shaping and Demonstrations
from Video Data [70.540936204654]
"Learning to run" competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed.
All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour.
We demonstrate how data from videos of human running can be used to shape the reward of the humanoid learning agent.
arXiv Detail & Related papers (2020-12-16T09:46:58Z) - Learning Object Manipulation Skills via Approximate State Estimation
from Real Videos [47.958512470724926]
Humans are adept at learning new tasks by watching a few instructional videos.
On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain.
In this paper, we explore a method that facilitates learning object manipulation skills directly from videos.
arXiv Detail & Related papers (2020-11-13T08:53:47Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Self-Supervised Human Depth Estimation from Monocular Videos [99.39414134919117]
Previous methods on estimating detailed human depth often require supervised training with ground truth' depth data.
This paper presents a self-supervised method that can be trained on YouTube videos without known depth.
Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
arXiv Detail & Related papers (2020-05-07T09:45:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.