CamLessMonoDepth: Monocular Depth Estimation with Unknown Camera
Parameters
- URL: http://arxiv.org/abs/2110.14347v1
- Date: Wed, 27 Oct 2021 10:54:15 GMT
- Title: CamLessMonoDepth: Monocular Depth Estimation with Unknown Camera
Parameters
- Authors: Sai Shyam Chanduri, Zeeshan Khan Suri, Igor Vozniak, Christian
M\"uller
- Abstract summary: Recent advances in monocular depth estimation have shown that gaining such knowledge from a single camera input is possible by training deep neural networks to predict inverse depth and pose, without the necessity of ground truth data.
In this work, we propose a method for implicit estimation of pinhole camera intrinsics along with depth and pose, by learning from monocular image sequences alone.
- Score: 1.7499351967216341
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Perceiving 3D information is of paramount importance in many applications of
computer vision. Recent advances in monocular depth estimation have shown that
gaining such knowledge from a single camera input is possible by training deep
neural networks to predict inverse depth and pose, without the necessity of
ground truth data. The majority of such approaches, however, require camera
parameters to be fed explicitly during training. As a result, image sequences
from wild cannot be used during training. While there exist methods which also
predict camera intrinsics, their performance is not on par with novel methods
taking camera parameters as input. In this work, we propose a method for
implicit estimation of pinhole camera intrinsics along with depth and pose, by
learning from monocular image sequences alone. In addition, by utilizing
efficient sub-pixel convolutions, we show that high fidelity depth estimates
can be obtained. We also embed pixel-wise uncertainty estimation into the
framework, to emphasize the possible applicability of this work in practical
domain. Finally, we demonstrate the possibility of accurate prediction of depth
information without prior knowledge of camera intrinsics, while outperforming
the existing state-of-the-art approaches on KITTI benchmark.
Related papers
- Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers [39.14931758754381]
We introduce a novel fusion method that bypasses monocular depth estimation altogether.
We show that our model can modulate its use of camera features based on the availability of lidar features.
arXiv Detail & Related papers (2023-12-22T18:51:50Z) - WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation
on Surgical Videos with Unknown Camera Parameters [0.0]
Building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community.
In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters.
arXiv Detail & Related papers (2023-08-22T20:35:24Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Depth360: Monocular Depth Estimation using Learnable Axisymmetric Camera
Model for Spherical Camera Image [2.3859169601259342]
We propose a learnable axisymmetric camera model which accepts distorted spherical camera images with two fisheye camera images.
We trained our models with a photo-realistic simulator to generate ground truth depth images.
We demonstrate the efficacy of our method using the spherical camera images from the GO Stanford dataset and pinhole camera images from the KITTI dataset.
arXiv Detail & Related papers (2021-10-20T07:21:04Z) - On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation [60.780823530087446]
We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
arXiv Detail & Related papers (2021-09-13T17:57:24Z) - Variational Monocular Depth Estimation for Reliability Prediction [12.951621755732544]
Self-supervised learning for monocular depth estimation is widely investigated as an alternative to supervised learning approach.
Previous works have successfully improved the accuracy of depth estimation by modifying the model structure.
In this paper, we theoretically formulate a variational model for the monocular depth estimation to predict the reliability of the estimated depth image.
arXiv Detail & Related papers (2020-11-24T06:23:51Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.