Self-Supervised Instance Segmentation by Grasping
- URL: http://arxiv.org/abs/2305.06305v1
- Date: Wed, 10 May 2023 16:51:36 GMT
- Title: Self-Supervised Instance Segmentation by Grasping
- Authors: YuXuan Liu, Xi Chen, Pieter Abbeel
- Abstract summary: We learn a grasp segmentation model to segment the grasped object from before and after grasp images.
Using the segmented objects, we can "cut" objects from their original scenes and "paste" them into new scenes to generate instance supervision.
We show that our grasp segmentation model provides a 5x error reduction when segmenting grasped objects compared with traditional image subtraction approaches.
- Score: 84.2469669256257
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Instance segmentation is a fundamental skill for many robotic applications.
We propose a self-supervised method that uses grasp interactions to collect
segmentation supervision for an instance segmentation model. When a robot
grasps an item, the mask of that grasped item can be inferred from the images
of the scene before and after the grasp. Leveraging this insight, we learn a
grasp segmentation model to segment the grasped object from before and after
grasp images. Such a model can segment grasped objects from thousands of grasp
interactions without costly human annotation. Using the segmented grasped
objects, we can "cut" objects from their original scenes and "paste" them into
new scenes to generate instance supervision. We show that our grasp
segmentation model provides a 5x error reduction when segmenting grasped
objects compared with traditional image subtraction approaches. Combined with
our "cut-and-paste" generation method, instance segmentation models trained
with our method achieve better performance than a model trained with 10x the
amount of labeled data. On a real robotic grasping system, our instance
segmentation model reduces the rate of grasp errors by over 3x compared to an
image subtraction baseline.
Related papers
- Appearance-based Refinement for Object-Centric Motion Segmentation [95.80420062679104]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a simple selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTubeVOS, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Synthetic Instance Segmentation from Semantic Image Segmentation Masks [37.54211062233899]
We propose a novel paradigm called synthetic instance segmentation (SISeg)
SISeg does not require training a semantic or/and instance segmentation model and avoids the need for instance-level image annotations.
It can achieve competitive results compared to the state-of-the-art fully-supervised instance segmentation methods.
arXiv Detail & Related papers (2023-08-02T05:13:02Z) - Towards Open-World Segmentation of Parts [16.056921233445784]
We propose to explore a class-agnostic part segmentation task.
We argue that models trained without part classes can better localize parts and segment them on objects unseen in training.
We show notable and consistent gains by our approach, essentially a critical step towards open-world part segmentation.
arXiv Detail & Related papers (2023-05-26T10:34:58Z) - Foreground-Background Separation through Concept Distillation from
Generative Image Foundation Models [6.408114351192012]
We present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions.
We show results on the task of segmenting four different objects (humans, dogs, cars, birds) and a use case scenario in medical image analysis.
arXiv Detail & Related papers (2022-12-29T13:51:54Z) - A Generalist Framework for Panoptic Segmentation of Images and Videos [61.61453194912186]
We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task.
A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function.
Our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically.
arXiv Detail & Related papers (2022-10-12T16:18:25Z) - Learning with Free Object Segments for Long-Tailed Instance Segmentation [15.563842274862314]
We find that an abundance of instance segments can potentially be obtained freely from object-centric im-ages.
Motivated by these insights, we propose FreeSeg for extracting and leveraging these "free" object segments.
FreeSeg achieves state-of-the-art accuracy for segmenting rare object categories.
arXiv Detail & Related papers (2022-02-22T19:06:16Z) - The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos [59.12750806239545]
We show that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis.
Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images.
By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively.
arXiv Detail & Related papers (2021-11-11T18:59:11Z) - SOLO: A Simple Framework for Instance Segmentation [84.00519148562606]
"instance categories" assigns categories to each pixel within an instance according to the instance's location.
"SOLO" is a simple, direct, and fast framework for instance segmentation with strong performance.
Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy.
arXiv Detail & Related papers (2021-06-30T09:56:54Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.