Depth360: Monocular Depth Estimation using Learnable Axisymmetric Camera
Model for Spherical Camera Image
- URL: http://arxiv.org/abs/2110.10415v1
- Date: Wed, 20 Oct 2021 07:21:04 GMT
- Title: Depth360: Monocular Depth Estimation using Learnable Axisymmetric Camera
Model for Spherical Camera Image
- Authors: Noriaki Hirose and Kosuke Tahara
- Abstract summary: We propose a learnable axisymmetric camera model which accepts distorted spherical camera images with two fisheye camera images.
We trained our models with a photo-realistic simulator to generate ground truth depth images.
We demonstrate the efficacy of our method using the spherical camera images from the GO Stanford dataset and pinhole camera images from the KITTI dataset.
- Score: 2.3859169601259342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised monocular depth estimation has been widely investigated to
estimate depth images and relative poses from RGB images. This framework is
attractive for researchers because the depth and pose networks can be trained
from just time sequence images without the need for the ground truth depth and
poses.
In this work, we estimate the depth around a robot (360 degree view) using
time sequence spherical camera images, from a camera whose parameters are
unknown. We propose a learnable axisymmetric camera model which accepts
distorted spherical camera images with two fisheye camera images. In addition,
we trained our models with a photo-realistic simulator to generate ground truth
depth images to provide supervision. Moreover, we introduced loss functions to
provide floor constraints to reduce artifacts that can result from reflective
floor surfaces. We demonstrate the efficacy of our method using the spherical
camera images from the GO Stanford dataset and pinhole camera images from the
KITTI dataset to compare our method's performance with that of baseline method
in learning the camera parameters.
Related papers
- FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.
We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.
We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - 3D Object Aided Self-Supervised Monocular Depth Estimation [5.579605877061333]
We propose a new method to address dynamic object movements through monocular 3D object detection.
Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose.
In this way, the depth of every pixel can be learned via a meaningful geometry model.
arXiv Detail & Related papers (2022-12-04T08:52:33Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - DEVO: Depth-Event Camera Visual Odometry in Challenging Conditions [30.892930944644853]
We present a novel real-time visual odometry framework for a stereo setup of a depth and high-resolution event camera.
Our framework balances accuracy and robustness against computational efficiency towards strong performance in challenging scenarios.
arXiv Detail & Related papers (2022-02-05T13:46:47Z) - CamLessMonoDepth: Monocular Depth Estimation with Unknown Camera
Parameters [1.7499351967216341]
Recent advances in monocular depth estimation have shown that gaining such knowledge from a single camera input is possible by training deep neural networks to predict inverse depth and pose, without the necessity of ground truth data.
In this work, we propose a method for implicit estimation of pinhole camera intrinsics along with depth and pose, by learning from monocular image sequences alone.
arXiv Detail & Related papers (2021-10-27T10:54:15Z) - Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion [51.19260542887099]
We show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model.
Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays.
We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems.
arXiv Detail & Related papers (2020-08-15T02:29:13Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.