Related papers: Learning depth from monocular video sequences

Learning depth from monocular video sequences

URL: http://arxiv.org/abs/2310.17156v1
Date: Thu, 26 Oct 2023 05:00:41 GMT
Title: Learning depth from monocular video sequences
Authors: Zhenwei Luo
Abstract summary: We propose a novel training loss which enables us to include more images for supervision during the training process. We also design a novel network architecture for single image estimation.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Learning single image depth estimation model from monocular video sequence is a very challenging problem. In this paper, we propose a novel training loss which enables us to include more images for supervision during the training process. We propose a simple yet effective model to account the frame to frame pixel motion. We also design a novel network architecture for single image estimation. When combined, our method produces state of the art results for monocular depth estimation on the KITTI dataset in the self-supervised setting.

Related papers

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos [50.28715151619659]
We propose a novel video-depth estimation method called Align3R to estimate temporal consistent depth maps for a dynamic video. Our key idea is to utilize the recent DUSt3R model to align estimated monocular depth maps of different timesteps. Experiments demonstrate that Align3R estimates consistent video depth and camera poses for a monocular video with superior performance than baseline methods.
arXiv Detail & Related papers (2024-12-04T07:09:59Z)
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions [30.148969711689773]
We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task. We systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information. This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control.
arXiv Detail & Related papers (2024-07-23T17:59:59Z)
Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision. The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z)
GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture. We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former. To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z)
Transformer-based model for monocular visual odometry: a video understanding approach [0.9790236766474201]
We deal with the monocular visual odometry as a video understanding task to estimate the 6-F camera's pose. We contribute by presenting the TS-DoVO model based on on-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset.
arXiv Detail & Related papers (2023-05-10T13:11:23Z)
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video [4.5307040147072275]
We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences. In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose. Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.
arXiv Detail & Related papers (2022-03-24T02:11:03Z)
SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation. Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z)
Unsupervised Learning of Monocular Depth and Ego-Motion Using Multiple Masks [14.82498499423046]
A new unsupervised learning method of depth and ego-motion using multiple masks from monocular video is proposed in this paper. The depth estimation network and the ego-motion estimation network are trained according to the constraints of depth and ego-motion without truth values. The experiments on KITTI dataset show our method achieves good performance in terms of depth and ego-motion.
arXiv Detail & Related papers (2021-04-01T12:29:23Z)
Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z)
Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video. We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video. Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.