Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D
Videos
- URL: http://arxiv.org/abs/2304.04325v1
- Date: Sun, 9 Apr 2023 23:13:39 GMT
- Title: Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D
Videos
- Authors: Shiyang Lu, Yunfu Deng, Abdeslam Boularias, Kostas Bekris
- Abstract summary: This work proposes a self-supervised learning system for segmenting rigid objects in RGB images.
The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot.
- Score: 11.40098981859033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work proposes a self-supervised learning system for segmenting rigid
objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D
videos of static objects, which can be captured with a camera carried by a
mobile robot. A key feature of the self-supervised training process is a
graph-matching algorithm that operates on the over-segmentation output of the
point cloud that is reconstructed from each video. The graph matching, along
with point cloud registration, is able to find reoccurring object patterns
across videos and combine them into 3D object pseudo labels, even under
occlusions or different viewing angles. Projected 2D object masks from 3D
pseudo labels are used to train a pixel-wise feature extractor through
contrastive learning. During online inference, a clustering method uses the
learned features to cluster foreground pixels into object segments. Experiments
highlight the method's effectiveness on both real and synthetic video datasets,
which include cluttered scenes of tabletop objects. The proposed method
outperforms existing unsupervised methods for object segmentation by a large
margin.
Related papers
- 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data [0.0]
2D region based convolutional neural networks (Mask R-CNN) deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.
In order to generate 3D point cloud coordinates, segmented 2D pixels of recognized object regions in the RGB image are merged into (u, v) points of the depth image.
arXiv Detail & Related papers (2024-06-19T08:00:35Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Unified Mask Embedding and Correspondence Learning for Self-Supervised
Video Segmentation [76.40565872257709]
We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning.
It is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos.
Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS17 and YouTube-VOS)
arXiv Detail & Related papers (2023-03-17T16:23:36Z) - SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor
Environments [67.34330257205525]
In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner.
We present a method that uses annotated objects to learn the objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments.
arXiv Detail & Related papers (2022-12-22T17:59:48Z) - ONeRF: Unsupervised 3D Object Segmentation from Multiple Views [59.445957699136564]
ONeRF is a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.
The segmented 3D objects are represented using separate Neural Radiance Fields (NeRFs) which allow for various 3D scene editing and novel view rendering.
arXiv Detail & Related papers (2022-11-22T06:19:37Z) - Topologically Persistent Features-based Object Recognition in Cluttered
Indoor Environments [1.2691047660244335]
Recognition of occluded objects in unseen indoor environments is a challenging problem for mobile robots.
This work proposes a new slicing-based topological descriptor that captures the 3D shape of object point clouds.
It yields similarities between the descriptors of the occluded and the corresponding unoccluded objects, enabling object unity-based recognition.
arXiv Detail & Related papers (2022-05-16T07:01:16Z) - A Self-supervised Learning System for Object Detection in Videos Using
Random Walks on Graphs [20.369646864364547]
This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images.
The proposed system receives as input several unlabeled videos of scenes containing various objects.
The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video.
arXiv Detail & Related papers (2020-11-10T23:37:40Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - Learning RGB-D Feature Embeddings for Unseen Object Instance
Segmentation [67.88276573341734]
We propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.
A metric learning loss function is utilized to learn to produce pixel-wise feature embeddings.
We further improve the segmentation accuracy with a new two-stage clustering algorithm.
arXiv Detail & Related papers (2020-07-30T00:23:07Z) - Self-Supervised Object-in-Gripper Segmentation from Robotic Motions [27.915309216800125]
We propose a robust solution for learning to segment unknown objects grasped by a robot.
We exploit motion and temporal cues in RGB video sequences.
Our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data.
arXiv Detail & Related papers (2020-02-11T15:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.