Video2Skill: Adapting Events in Demonstration Videos to Skills in an
Environment using Cyclic MDP Homomorphisms
- URL: http://arxiv.org/abs/2109.03813v2
- Date: Thu, 9 Sep 2021 18:55:43 GMT
- Title: Video2Skill: Adapting Events in Demonstration Videos to Skills in an
Environment using Cyclic MDP Homomorphisms
- Authors: Sumedh A Sontakke, Sumegh Roychowdhury, Mausoom Sarkar, Nikaash Puri,
Balaji Krishnamurthy, Laurent Itti
- Abstract summary: Video2Skill (V2S) attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos.
We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations.
We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data.
- Score: 16.939129935919325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans excel at learning long-horizon tasks from demonstrations augmented
with textual commentary, as evidenced by the burgeoning popularity of tutorial
videos online. Intuitively, this capability can be separated into 2 distinct
subtasks - first, dividing a long-horizon demonstration sequence into
semantically meaningful events; second, adapting such events into meaningful
behaviors in one's own environment. Here, we present Video2Skill (V2S), which
attempts to extend this capability to artificial agents by allowing a robot arm
to learn from human cooking videos. We first use sequence-to-sequence
Auto-Encoder style architectures to learn a temporal latent space for events in
long-horizon demonstrations. We then transfer these representations to the
robotic target domain, using a small amount of offline and unrelated
interaction data (sequences of state-action pairs of the robot arm controlled
by an expert) to adapt these events into actionable representations, i.e.,
skills. Through experiments, we demonstrate that our approach results in
self-supervised analogy learning, where the agent learns to draw analogies
between motions in human demonstration data and behaviors in the robotic
environment. We also demonstrate the efficacy of our approach on model learning
- demonstrating how Video2Skill utilizes prior knowledge from human
demonstration to outperform traditional model learning of long-horizon
dynamics. Finally, we demonstrate the utility of our approach for non-tabula
rasa decision-making, i.e, utilizing video demonstration for zero-shot skill
generation.
Related papers
- Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training [69.54948297520612]
Learning a generalist embodied agent poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets.
We introduce a novel framework to tackle these challenges, which leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos.
Our method generates high-fidelity future videos for planning and enhances the fine-tuned policies compared to previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-02-22T09:48:47Z) - XSkill: Cross Embodiment Skill Discovery [41.624343257852146]
XSkill is an imitation learning framework that discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos.
Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate skill transfer and composition for unseen tasks.
arXiv Detail & Related papers (2023-07-19T12:51:28Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - Cross-Domain Transfer via Semantic Skill Imitation [49.83150463391275]
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL)
Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "turning on the stove"
arXiv Detail & Related papers (2022-12-14T18:46:14Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Continual Learning from Demonstration of Robotics Skills [5.573543601558405]
Methods for teaching motion skills to robots focus on training for a single skill at a time.
We propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers.
arXiv Detail & Related papers (2022-02-14T16:26:52Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.