Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion
- URL: http://arxiv.org/abs/2008.06630v1
- Date: Sat, 15 Aug 2020 02:29:13 GMT
- Title: Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion
- Authors: Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Wolfram
Burgard, Greg Shakhnarovich, Adrien Gaidon
- Abstract summary: We show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model.
Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays.
We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems.
- Score: 51.19260542887099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning has emerged as a powerful tool for depth and
ego-motion estimation, leading to state-of-the-art results on benchmark
datasets. However, one significant limitation shared by current methods is the
assumption of a known parametric camera model -- usually the standard pinhole
geometry -- leading to failure when applied to imaging systems that deviate
significantly from this assumption (e.g., catadioptric cameras or underwater
imaging). In this work, we show that self-supervision can be used to learn
accurate depth and ego-motion estimation without prior knowledge of the camera
model. Inspired by the geometric model of Grossberg and Nayar, we introduce
Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise
projection rays, approximating a wide range of cameras. NRS are fully
differentiable and can be learned end-to-end from unlabeled raw videos. We
demonstrate the use of NRS for self-supervised learning of visual odometry and
depth estimation from raw videos obtained using a wide variety of camera
systems, including pinhole, fisheye, and catadioptric.
Related papers
- FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.
We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.
We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Self-Supervised Camera Self-Calibration from Video [34.35533943247917]
We propose a learning algorithm to regress per-sequence calibration parameters using an efficient family of general camera models.
Our procedure achieves self-calibration results with sub-pixel reprojection error, outperforming other learning-based methods.
arXiv Detail & Related papers (2021-12-06T19:42:05Z) - Depth360: Monocular Depth Estimation using Learnable Axisymmetric Camera
Model for Spherical Camera Image [2.3859169601259342]
We propose a learnable axisymmetric camera model which accepts distorted spherical camera images with two fisheye camera images.
We trained our models with a photo-realistic simulator to generate ground truth depth images.
We demonstrate the efficacy of our method using the spherical camera images from the GO Stanford dataset and pinhole camera images from the KITTI dataset.
arXiv Detail & Related papers (2021-10-20T07:21:04Z) - Self-Calibrating Neural Radiance Fields [68.64327335620708]
We jointly learn the geometry of the scene and the accurate camera parameters without any calibration objects.
Our camera model consists of a pinhole model, a fourth order radial distortion, and a generic noise model that can learn arbitrary non-linear camera distortions.
arXiv Detail & Related papers (2021-08-31T13:34:28Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Wide-angle Image Rectification: A Survey [86.36118799330802]
wide-angle images contain distortions that violate the assumptions underlying pinhole camera models.
Image rectification, which aims to correct these distortions, can solve these problems.
We present a detailed description and discussion of the camera models used in different approaches.
Next, we review both traditional geometry-based image rectification methods and deep learning-based methods.
arXiv Detail & Related papers (2020-10-30T17:28:40Z) - Neural Geometric Parser for Single Image Camera Calibration [17.393543270903653]
We propose a neural geometric learning single image camera calibration for man-made scenes.
Our approach considers both semantic and geometric cues, resulting in significant accuracy improvement.
The experimental results reveal that the performance of our neural approach is significantly higher than that of existing state-of-the-art camera calibration techniques.
arXiv Detail & Related papers (2020-07-23T08:29:00Z) - Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.