A Learned Stereo Depth System for Robotic Manipulation in Homes
- URL: http://arxiv.org/abs/2109.11644v1
- Date: Thu, 23 Sep 2021 20:53:55 GMT
- Title: A Learned Stereo Depth System for Robotic Manipulation in Homes
- Authors: Krishna Shankar, Mark Tjersland, Jeremy Ma, Kevin Stone, Max
Bajracharya
- Abstract summary: We present a passive stereo depth system that produces dense and accurate point clouds optimized for human environments.
The system consists of an algorithm combining learned stereo matching with engineered filtering, a training and data-mixing methodology, and a sensor hardware design.
- Score: 2.06216858680643
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a passive stereo depth system that produces dense and accurate
point clouds optimized for human environments, including dark, textureless,
thin, reflective and specular surfaces and objects, at 2560x2048 resolution,
with 384 disparities, in 30 ms. The system consists of an algorithm combining
learned stereo matching with engineered filtering, a training and data-mixing
methodology, and a sensor hardware design. Our architecture is 15x faster than
approaches that perform similarly on the Middlebury and Flying Things Stereo
Benchmarks. To effectively supervise the training of this model, we combine
real data labelled using off-the-shelf depth sensors, as well as a number of
different rendered, simulated labeled datasets. We demonstrate the efficacy of
our system by presenting a large number of qualitative results in the form of
depth maps and point-clouds, experiments validating the metric accuracy of our
system and comparisons to other sensors on challenging objects and scenes. We
also show the competitiveness of our algorithm compared to state-of-the-art
learned models using the Middlebury and FlyingThings datasets.
Related papers
- ES-PTAM: Event-based Stereo Parallel Tracking and Mapping [11.801511288805225]
Event cameras offer advantages to overcome the limitations of standard cameras.
We propose a novel event-based stereo VO system by combining two ideas.
We evaluate the system on five real-world datasets.
arXiv Detail & Related papers (2024-08-28T07:56:28Z) - On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks [61.74608497496841]
Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities.
This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction.
arXiv Detail & Related papers (2023-03-26T22:32:44Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Metric-based multimodal meta-learning for human movement identification
via footstep recognition [3.300376360949452]
We describe a novel metric-based learning approach that introduces a multimodal framework.
We learn general-purpose representations from low multisensory data obtained from omnipresent sensing systems.
Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity.
arXiv Detail & Related papers (2021-11-15T18:46:14Z) - Self-Supervised Depth Completion for Active Stereo [55.79929735390945]
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps.
These depth sensors suffer from stereo artefacts and do not provide dense depth estimates.
We present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps.
arXiv Detail & Related papers (2021-10-07T07:33:52Z) - Dominant motion identification of multi-particle system using deep
learning from video [0.0]
In this work, we provide a deep-learning framework that extracts relevant information from real-world videos of highly systems.
We demonstrate this approach on videos of confined multi-agent/particle systems of ants, termites, fishes.
Furthermore, we explore how these seemingly diverse systems have predictable underlying behavior.
arXiv Detail & Related papers (2021-04-26T17:10:56Z) - Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Physics-based Differentiable Depth Sensor Simulation [5.134435281973137]
We introduce a novel end-to-end differentiable simulation pipeline for the generation of realistic 2.5D scans.
Each module can be differentiated w.r.t sensor and scene parameters.
Our simulation greatly improves the performance of the resulting models on real scans.
arXiv Detail & Related papers (2021-03-30T17:59:43Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.