Beyond Visual Field of View: Perceiving 3D Environment with Echoes and
Vision
- URL: http://arxiv.org/abs/2207.01136v2
- Date: Fri, 9 Feb 2024 03:29:43 GMT
- Title: Beyond Visual Field of View: Perceiving 3D Environment with Echoes and
Vision
- Authors: Lingyu Zhu, Esa Rahtu, Hang Zhao
- Abstract summary: This paper focuses on perceiving and navigating 3D environments using echoes and RGB image.
In particular, we perform depth estimation by fusing RGB image with echoes, received from multiple orientations.
We show that the echoes provide holistic and in-expensive information about the 3D structures complementing the RGB image.
- Score: 51.385731364529306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper focuses on perceiving and navigating 3D environments using echoes
and RGB image. In particular, we perform depth estimation by fusing RGB image
with echoes, received from multiple orientations. Unlike previous works, we go
beyond the field of view of the RGB and estimate dense depth maps for
substantially larger parts of the environment. We show that the echoes provide
holistic and in-expensive information about the 3D structures complementing the
RGB image. Moreover, we study how echoes and the wide field-of-view depth maps
can be utilised in robot navigation. We compare the proposed methods against
recent baselines using two sets of challenging realistic 3D environments:
Replica and Matterport3D. The implementation and pre-trained models will be
made publicly available.
Related papers
- Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB [48.31210455404533]
Heatmap-based 3D pose estimator is able to hallucinate depth information from the RGB frames given at inference time.
depth information is used exclusively during training by enforcing our RGB-based hallucination network to learn similar features to a backbone pre-trained only on depth data.
arXiv Detail & Related papers (2024-09-17T11:59:34Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Calibrating Panoramic Depth Estimation for Practical Localization and
Mapping [20.621442016969976]
The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation.
We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information.
arXiv Detail & Related papers (2023-08-27T04:50:05Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - OccDepth: A Depth-Aware Method for 3D Semantic Scene Completion [6.297023466646343]
3D Semantic Scene Completion (SSC) can provide dense geometric and semantic scene representations, which can be applied in the field of autonomous driving and robotic systems.
We propose the first stereo SSC method named OccDepth, which fully exploits implicit depth information from stereo images (or RGBD images) to help the recovery of 3D geometric structures.
arXiv Detail & Related papers (2023-02-27T06:35:03Z) - BS3D: Building-scale 3D Reconstruction from RGB-D Images [25.604775584883413]
We propose an easy-to-use framework for acquiring building-scale 3D reconstruction using a consumer depth camera.
Unlike complex and expensive acquisition setups, our system enables crowd-sourcing, which can greatly benefit data-hungry algorithms.
arXiv Detail & Related papers (2023-01-03T11:46:14Z) - BIPS: Bi-modal Indoor Panorama Synthesis via Residual Depth-aided
Adversarial Learning [26.24526760567159]
We propose a novel bi-modal (RGB-D) panorama synthesis framework.
We focus on indoor environments where the RGB-D panorama can provide a complete 3D model for many applications.
Our method synthesizes high-quality indoor RGB-D panoramas and provides realistic 3D indoor models.
arXiv Detail & Related papers (2021-12-12T08:20:01Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image [34.79657678041356]
We propose a fast adversarial learning-based method to reconstruct the complete and detailed 3D human from a single RGB-D image.
Given a consumer RGB-D sensor, NormalGAN can generate the complete and detailed 3D human reconstruction results in 20 fps.
arXiv Detail & Related papers (2020-07-30T09:35:46Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.