Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic
Video with Applications for Virtual Reality
- URL: http://arxiv.org/abs/2010.07704v2
- Date: Tue, 10 Nov 2020 00:35:33 GMT
- Title: Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic
Video with Applications for Virtual Reality
- Authors: Alisha Sharma, Ryan Nett, and Jonathan Ventura
- Abstract summary: We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video.
Panoramic depth estimation is an important technology for applications such as virtual reality, 3D modeling, and autonomous robotic navigation.
- Score: 2.294014185517203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a convolutional neural network model for unsupervised learning
of depth and ego-motion from cylindrical panoramic video. Panoramic depth
estimation is an important technology for applications such as virtual reality,
3D modeling, and autonomous robotic navigation. In contrast to previous
approaches for applying convolutional neural networks to panoramic imagery, we
use the cylindrical panoramic projection which allows for the use of the
traditional CNN layers such as convolutional filters and max pooling without
modification. Our evaluation of synthetic and real data shows that unsupervised
learning of depth and ego-motion on cylindrical panoramic images can produce
high-quality depth maps and that an increased field-of-view improves ego-motion
estimation accuracy. We create two new datasets to evaluate our approach: a
synthetic dataset created using the CARLA simulator, and Headcam, a novel
dataset of panoramic video collected from a helmet-mounted camera while biking
in an urban setting. We also apply our network to the problem of converting
monocular panoramas to stereo panoramas.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving.
It predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
It is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image aided Generalizable Neural Radiance Field [1.3162012586770577]
We introduce MSI-NeRF, which combines deep learning omnidirectional depth estimation and novel view synthesis.
We construct a multi-sphere image as a cost volume through feature extraction and warping of the input images.
Our network has the generalization ability to reconstruct unknown scenes efficiently using only four images.
arXiv Detail & Related papers (2024-03-16T07:26:50Z) - OmniSCV: An Omnidirectional Synthetic Image Generator for Computer
Vision [5.2178708158547025]
We present a tool for generating datasets of omnidirectional images with semantic and depth information.
These images are synthesized from a set of captures that are acquired in a realistic virtual environment for Unreal Engine 4.
We include in our tool photorealistic non-central-projection systems as non-central panoramas and non-central catadioptric systems.
arXiv Detail & Related papers (2024-01-30T14:40:19Z) - Calibrating Panoramic Depth Estimation for Practical Localization and
Mapping [20.621442016969976]
The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation.
We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information.
arXiv Detail & Related papers (2023-08-27T04:50:05Z) - NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion
Models [85.20004959780132]
We introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments.
We show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
arXiv Detail & Related papers (2023-04-19T16:13:21Z) - Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic
Images in Facial Capture Pipelines [8.366597450893456]
We propose an end-to-end pipeline for building and tracking 3D facial models from personalized in-the-wild video data.
We present a method for automatic data curation and retrieval based on a hierarchical clustering framework typical of collision algorithms in traditional computer graphics pipelines.
We outline how we train a motion capture regressor, leveraging the aforementioned techniques to avoid the need for real-world ground truth data.
arXiv Detail & Related papers (2022-04-22T15:09:49Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion [51.19260542887099]
We show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model.
Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays.
We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems.
arXiv Detail & Related papers (2020-08-15T02:29:13Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.