Self-Supervised Object-in-Gripper Segmentation from Robotic Motions
- URL: http://arxiv.org/abs/2002.04487v3
- Date: Fri, 6 Nov 2020 10:31:16 GMT
- Title: Self-Supervised Object-in-Gripper Segmentation from Robotic Motions
- Authors: Wout Boerdijk, Martin Sundermeyer, Maximilian Durner and Rudolph
Triebel
- Abstract summary: We propose a robust solution for learning to segment unknown objects grasped by a robot.
We exploit motion and temporal cues in RGB video sequences.
Our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data.
- Score: 27.915309216800125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate object segmentation is a crucial task in the context of robotic
manipulation. However, creating sufficient annotated training data for neural
networks is particularly time consuming and often requires manual labeling. To
this end, we propose a simple, yet robust solution for learning to segment
unknown objects grasped by a robot. Specifically, we exploit motion and
temporal cues in RGB video sequences. Using optical flow estimation we first
learn to predict segmentation masks of our given manipulator. Then, these
annotations are used in combination with motion cues to automatically
distinguish between background, manipulator and unknown, grasped object. In
contrast to existing systems our approach is fully self-supervised and
independent of precise camera calibration, 3D models or potentially imperfect
depth data. We perform a thorough comparison with alternative baselines and
approaches from literature. The object masks and views are shown to be suitable
training data for segmentation networks that generalize to novel environments
and also allow for watertight 3D reconstruction.
Related papers
- Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D
Videos [11.40098981859033]
This work proposes a self-supervised learning system for segmenting rigid objects in RGB images.
The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot.
arXiv Detail & Related papers (2023-04-09T23:13:39Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z) - Rapid Pose Label Generation through Sparse Representation of Unknown
Objects [7.32172860877574]
This work presents an approach for rapidly generating real-world, pose-annotated RGB-D data for unknown objects.
We first source minimalistic labelings of an ordered set of arbitrarily chosen keypoints over a set of RGB-D videos.
By solving an optimization problem, we combine these labels under a world frame to recover a sparse, keypoint-based representation of the object.
arXiv Detail & Related papers (2020-11-07T15:14:03Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - Self-supervised Transfer Learning for Instance Segmentation through
Physical Interaction [25.956451840257916]
We present a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner.
Our robot pushes unknown objects on a table and uses information from optical flow to create training labels in the form of object masks.
We evaluate our trained network (SelfDeepMask) on a set of real images showing challenging and cluttered scenes with novel objects.
arXiv Detail & Related papers (2020-05-19T14:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.