Generalizable task representation learning from human demonstration
videos: a geometric approach
- URL: http://arxiv.org/abs/2202.13604v1
- Date: Mon, 28 Feb 2022 08:25:57 GMT
- Title: Generalizable task representation learning from human demonstration
videos: a geometric approach
- Authors: Jun Jin, Martin Jagersand
- Abstract summary: We study the problem of generalizable task learning from human demonstration videos without extra training on the robot or pre-recorded robot motions.
We propose CoVGS-IL, which uses a graphstructured task function to learn task representations under structural constraints.
- Score: 4.640835690336654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of generalizable task learning from human demonstration
videos without extra training on the robot or pre-recorded robot motions. Given
a set of human demonstration videos showing a task with different objects/tools
(categorical objects), we aim to learn a representation of visual observation
that generalizes to categorical objects and enables efficient controller
design. We propose to introduce a geometric task structure to the
representation learning problem that geometrically encodes the task
specification from human demonstration videos, and that enables generalization
by building task specification correspondence between categorical objects.
Specifically, we propose CoVGS-IL, which uses a graph-structured task function
to learn task representations under structural constraints. Our method enables
task generalization by selecting geometric features from different objects
whose inner connection relationships define the same task in geometric
constraints. The learned task representation is then transferred to a robot
controller using uncalibrated visual servoing (UVS); thus, the need for extra
robot training or pre-recorded robot motions is removed.
Related papers
- ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition [8.654140442734354]
Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments.
We present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple convex shapes.
Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping.
arXiv Detail & Related papers (2024-03-26T19:26:53Z) - Few-Shot In-Context Imitation Learning via Implicit Graph Alignment [15.215659641228655]
We formulate imitation learning as a conditional alignment problem between graph representations of objects.
We show that this conditioning allows for in-context learning, where a robot can perform a task on a set of new objects immediately after the demonstrations.
arXiv Detail & Related papers (2023-10-18T18:26:01Z) - InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [52.981128371910266]
We present InstructDiffusion, a framework for aligning computer vision tasks with human instructions.
InstructDiffusion could handle a variety of vision tasks, including understanding tasks and generative tasks.
It even exhibits the ability to handle unseen tasks and outperforms prior methods on novel datasets.
arXiv Detail & Related papers (2023-09-07T17:56:57Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Can Foundation Models Perform Zero-Shot Task Specification For Robot
Manipulation? [54.442692221567796]
Task specification is critical for engagement of non-expert end-users and adoption of personalized robots.
A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene.
In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use.
arXiv Detail & Related papers (2022-04-23T19:39:49Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z) - Learning Rope Manipulation Policies Using Dense Object Descriptors
Trained on Synthetic Depth Data [32.936908766549344]
We present an approach that learns point-pair correspondences between initial and goal rope configurations.
In 50 trials of a knot-tying task with the ABB YuMi Robot, the system achieves a 66% knot-tying success rate from previously unseen configurations.
arXiv Detail & Related papers (2020-03-03T23:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.