Monocular Depth Estimation through Virtual-world Supervision and
Real-world SfM Self-Supervision
- URL: http://arxiv.org/abs/2103.12209v1
- Date: Mon, 22 Mar 2021 22:33:49 GMT
- Title: Monocular Depth Estimation through Virtual-world Supervision and
Real-world SfM Self-Supervision
- Authors: Akhil Gurram, Ahmet Faruk Tuna, Fengyi Shen, Onay Urfalioglu, and
Antonio M. L\'opez
- Abstract summary: We perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision.
Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth information is essential for on-board perception in autonomous driving
and driver assistance. Monocular depth estimation (MDE) is very appealing since
it allows for appearance and depth being on direct pixelwise correspondence
without further calibration. Best MDE models are based on Convolutional Neural
Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground
truth (GT). Usually, this GT is acquired at training time through a calibrated
multi-modal suite of sensors. However, also using only a monocular system at
training time is cheaper and more scalable. This is possible by relying on
structure-from-motion (SfM) principles to generate self-supervision.
Nevertheless, problems of camouflaged objects, visibility changes,
static-camera intervals, textureless areas, and scale ambiguity, diminish the
usefulness of such self-supervision. In this paper, we perform monocular depth
estimation by virtual-world supervision (MonoDEVS) and real-world SfM
self-supervision. We compensate the SfM self-supervision limitations by
leveraging virtual-world images with accurate semantic and depth supervision
and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms
previous MDE CNNs trained on monocular and even stereo sequences.
Related papers
- MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB Views [4.570455747723325]
MonoSelfRecon achieves explicit 3D mesh reconstruction for generalizable indoor scenes with monocular RGB views by purely self-supervision on voxel-SDF.
We propose novel self-supervised losses, which not only support pure self-supervision, but can be used together with supervised signals to further boost supervised training.
arXiv Detail & Related papers (2024-04-10T05:41:05Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - MonoViT: Self-Supervised Monocular Depth Estimation with a Vision
Transformer [52.0699787446221]
We propose MonoViT, a framework combining the global reasoning enabled by ViT models with the flexibility of self-supervised monocular depth estimation.
By combining plain convolutions with Transformer blocks, our model can reason locally and globally, yielding depth prediction at a higher level of detail and accuracy.
arXiv Detail & Related papers (2022-08-06T16:54:45Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation [88.8963330073454]
We propose a novel monocular 6D pose estimation approach by means of self-supervised learning.
We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model.
Our proposed self-supervision outperforms all other methods relying on synthetic data.
arXiv Detail & Related papers (2022-03-19T15:12:06Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.