Depth by Poking: Learning to Estimate Depth from Self-Supervised
Grasping
- URL: http://arxiv.org/abs/2006.08903v1
- Date: Tue, 16 Jun 2020 03:34:26 GMT
- Title: Depth by Poking: Learning to Estimate Depth from Self-Supervised
Grasping
- Authors: Ben Goodrich, Alex Kuefler, William D. Richards
- Abstract summary: We train a neural network model to estimate depth from RGB-D images.
Our network predicts, for each pixel in an input image, the z position that a robot's end effector would reach if it attempted to grasp or poke at the corresponding position.
We show our approach achieves significantly lower root mean squared error than traditional structured light sensors.
- Score: 6.382990675677317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate depth estimation remains an open problem for robotic manipulation;
even state of the art techniques including structured light and LiDAR sensors
fail on reflective or transparent surfaces. We address this problem by training
a neural network model to estimate depth from RGB-D images, using labels from
physical interactions between a robot and its environment. Our network
predicts, for each pixel in an input image, the z position that a robot's end
effector would reach if it attempted to grasp or poke at the corresponding
position. Given an autonomous grasping policy, our approach is self-supervised
as end effector position labels can be recovered through forward kinematics,
without human annotation. Although gathering such physical interaction data is
expensive, it is necessary for training and routine operation of state of the
art manipulation systems. Therefore, this depth estimator comes ``for free''
while collecting data for other tasks (e.g., grasping, pushing, placing). We
show our approach achieves significantly lower root mean squared error than
traditional structured light sensors and unsupervised deep learning methods on
difficult, industry-scale jumbled bin datasets.
Related papers
- Embodiment: Self-Supervised Depth Estimation Based on Camera Models [17.931220115676258]
Self-supervised methods possess great potential due to no labeling cost.
However, self-supervised learning still has a large gap with supervised learning in 3D reconstruction and depth estimation performance.
By embedding the camera's physical properties into the model, we can calculate depth priors for ground regions and regions connected to the ground.
arXiv Detail & Related papers (2024-08-02T20:40:19Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Markerless Camera-to-Robot Pose Estimation via Self-supervised
Sim-to-Real Transfer [26.21320177775571]
We propose an end-to-end pose estimation framework that is capable of online camera-to-robot calibration and a self-supervised training method.
Our framework combines deep learning and geometric vision for solving the robot pose, and the pipeline is fully differentiable.
arXiv Detail & Related papers (2023-02-28T05:55:42Z) - A Distance-Geometric Method for Recovering Robot Joint Angles From an
RGB Image [7.971699294672282]
We present a novel method for retrieving the joint angles of a robot manipulator using only a single RGB image of its current configuration.
Our approach, based on a distance-geometric representation of the configuration space, exploits the knowledge of a robot's kinematic model.
arXiv Detail & Related papers (2023-01-05T12:57:45Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z) - Task-relevant Representation Learning for Networked Robotic Perception [74.0215744125845]
This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception model's ultimate objective.
Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods.
arXiv Detail & Related papers (2020-11-06T07:39:08Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z) - Self-Supervised Object-in-Gripper Segmentation from Robotic Motions [27.915309216800125]
We propose a robust solution for learning to segment unknown objects grasped by a robot.
We exploit motion and temporal cues in RGB video sequences.
Our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data.
arXiv Detail & Related papers (2020-02-11T15:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.