Learning depth from monocular video sequences
- URL: http://arxiv.org/abs/2310.17156v1
- Date: Thu, 26 Oct 2023 05:00:41 GMT
- Title: Learning depth from monocular video sequences
- Authors: Zhenwei Luo
- Abstract summary: We propose a novel training loss which enables us to include more images for supervision during the training process.
We also design a novel network architecture for single image estimation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Learning single image depth estimation model from monocular video sequence is
a very challenging problem. In this paper, we propose a novel training loss
which enables us to include more images for supervision during the training
process. We propose a simple yet effective model to account the frame to frame
pixel motion. We also design a novel network architecture for single image
estimation. When combined, our method produces state of the art results for
monocular depth estimation on the KITTI dataset in the self-supervised setting.
Related papers
- Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - Transformer-based model for monocular visual odometry: a video
understanding approach [0.9790236766474201]
We deal with the monocular visual odometry as a video understanding task to estimate the 6-F camera's pose.
We contribute by presenting the TS-DoVO model based on on-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner.
Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset.
arXiv Detail & Related papers (2023-05-10T13:11:23Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Unsupervised Simultaneous Learning for Camera Re-Localization and Depth
Estimation from Video [4.5307040147072275]
We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences.
In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose.
Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.
arXiv Detail & Related papers (2022-03-24T02:11:03Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Unsupervised Learning of Monocular Depth and Ego-Motion Using Multiple
Masks [14.82498499423046]
A new unsupervised learning method of depth and ego-motion using multiple masks from monocular video is proposed in this paper.
The depth estimation network and the ego-motion estimation network are trained according to the constraints of depth and ego-motion without truth values.
The experiments on KITTI dataset show our method achieves good performance in terms of depth and ego-motion.
arXiv Detail & Related papers (2021-04-01T12:29:23Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.