You Only Demonstrate Once: Category-Level Manipulation from Single
Visual Demonstration
- URL: http://arxiv.org/abs/2201.12716v1
- Date: Sun, 30 Jan 2022 03:59:14 GMT
- Title: You Only Demonstrate Once: Category-Level Manipulation from Single
Visual Demonstration
- Authors: Bowen Wen, Wenzhao Lian, Kostas Bekris, Stefan Schaal
- Abstract summary: This work proposes a novel, category-level manipulation framework.
It uses an object-centric, category-level representation and model-free 6 DoF motion tracking.
Experiments demonstrate its efficacy in a range of challenging industrial tasks in high-precision assembly.
- Score: 9.245605426105922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Promising results have been achieved recently in category-level manipulation
that generalizes across object instances. Nevertheless, it often requires
expensive real-world data collection and manual specification of semantic
keypoints for each object category and task. Additionally, coarse keypoint
predictions and ignoring intermediate action sequences hinder adoption in
complex manipulation tasks beyond pick-and-place. This work proposes a novel,
category-level manipulation framework that leverages an object-centric,
category-level representation and model-free 6 DoF motion tracking. The
canonical object representation is learned solely in simulation and then used
to parse a category-level, task trajectory from a single demonstration video.
The demonstration is reprojected to a target trajectory tailored to a novel
object via the canonical representation. During execution, the manipulation
horizon is decomposed into long-range, collision-free motion and last-inch
manipulation. For the latter part, a category-level behavior cloning (CatBC)
method leverages motion tracking to perform closed-loop control. CatBC follows
the target trajectory, projected from the demonstration and anchored to a
dynamically selected category-level coordinate frame. The frame is
automatically selected along the manipulation horizon by a local attention
mechanism. This framework allows to teach different manipulation strategies by
solely providing a single demonstration, without complicated manual
programming. Extensive experiments demonstrate its efficacy in a range of
challenging industrial tasks in high-precision assembly, which involve learning
complex, long-horizon policies. The process exhibits robustness against
uncertainty due to dynamics as well as generalization across object instances
and scene configurations.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Kinematic-aware Prompting for Generalizable Articulated Object
Manipulation with LLMs [53.66070434419739]
Generalizable articulated object manipulation is essential for home-assistant robots.
We propose a kinematic-aware prompting framework that prompts Large Language Models with kinematic knowledge of objects to generate low-level motion waypoints.
Our framework outperforms traditional methods on 8 categories seen and shows a powerful zero-shot capability for 8 unseen articulated object categories.
arXiv Detail & Related papers (2023-11-06T03:26:41Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - USEEK: Unsupervised SE(3)-Equivariant 3D Keypoints for Generalizable
Manipulation [19.423310410631085]
U.S.EEK is an unsupervised SE(3)-equivariant keypoints method that enjoys alignment across instances in a category.
With USEEK in hand, the robot can infer the category-level task-relevant object frames in an efficient and explainable manner.
arXiv Detail & Related papers (2022-09-28T06:42:29Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.