Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot
Interaction
- URL: http://arxiv.org/abs/2302.03793v1
- Date: Tue, 7 Feb 2023 23:11:29 GMT
- Title: Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot
Interaction
- Authors: Yangxiao Lu, Ninad Khargonkar, Zesheng Xu, Charles Averill, Kamalesh
Palanisamy, Kaiyu Hang, Yunhui Guo, Nicholas Ruozzi, Yu Xiang
- Abstract summary: We introduce a novel robotic system for improving unseen object instance segmentation in the real world by leveraging long-term robot interaction with objects.
Our system defers the decision on segmenting objects after a sequence of robot pushing actions.
We demonstrate the usefulness of our system by fine-tuning segmentation networks trained on synthetic data with real-world data collected by our system.
- Score: 23.572104156617844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel robotic system for improving unseen object instance
segmentation in the real world by leveraging long-term robot interaction with
objects. Previous approaches either grasp or push an object and then obtain the
segmentation mask of the grasped or pushed object after one action. Instead,
our system defers the decision on segmenting objects after a sequence of robot
pushing actions. By applying multi-object tracking and video object
segmentation on the images collected via robot pushing, our system can generate
segmentation masks of all the objects in these images in a self-supervised way.
These include images where objects are very close to each other, and
segmentation errors usually occur on these images for existing object
segmentation networks. We demonstrate the usefulness of our system by
fine-tuning segmentation networks trained on synthetic data with real-world
data collected by our system. We show that, after fine-tuning, the segmentation
accuracy of the networks is significantly improved both in the same domain and
across different domains. In addition, we verify that the fine-tuned networks
improve top-down robotic grasping of unseen objects in the real world.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant
Features [6.358423536732677]
We introduce a novel approach to correct inaccurate segmentation by using robot interaction and a designed body frame-invariant feature.
We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%.
arXiv Detail & Related papers (2024-03-04T05:03:24Z) - Self-Supervised Instance Segmentation by Grasping [84.2469669256257]
We learn a grasp segmentation model to segment the grasped object from before and after grasp images.
Using the segmented objects, we can "cut" objects from their original scenes and "paste" them into new scenes to generate instance supervision.
We show that our grasp segmentation model provides a 5x error reduction when segmenting grasped objects compared with traditional image subtraction approaches.
arXiv Detail & Related papers (2023-05-10T16:51:36Z) - DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z) - Semantically Grounded Object Matching for Robust Robotic Scene
Rearrangement [21.736603698556042]
We present a novel approach to object matching that uses a large pre-trained vision-language model to match objects in a cross-instance setting.
We demonstrate that this provides considerably improved matching performance in cross-instance settings.
arXiv Detail & Related papers (2021-11-15T18:39:43Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - Learning Object Depth from Camera Motion and Video Object Segmentation [43.81711115175958]
This paper addresses the problem of learning to estimate the depth of segmented objects given some measurement of camera motion.
We create artificial object segmentations that are scaled for changes in distance between the camera and object, and our network learns to estimate object depth even with segmentation errors.
We demonstrate our approach across domains using a robot camera to locate objects from the YCB dataset and a vehicle camera to locate obstacles while driving.
arXiv Detail & Related papers (2020-07-11T03:50:57Z) - Self-supervised Transfer Learning for Instance Segmentation through
Physical Interaction [25.956451840257916]
We present a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner.
Our robot pushes unknown objects on a table and uses information from optical flow to create training labels in the form of object masks.
We evaluate our trained network (SelfDeepMask) on a set of real images showing challenging and cluttered scenes with novel objects.
arXiv Detail & Related papers (2020-05-19T14:31:24Z) - Instance Segmentation of Visible and Occluded Regions for Finding and
Picking Target from a Pile of Objects [25.836334764387498]
We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object.
We extend an existing instance segmentation model with a novel relook' architecture, in which the model explicitly learns the inter-instance relationship.
Also, by using image synthesis, we make the system capable of handling new objects without human annotations.
arXiv Detail & Related papers (2020-01-21T12:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.