Related papers: SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes

SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes

URL: http://arxiv.org/abs/2211.03660v2
Date: Thu, 5 Oct 2023 08:53:01 GMT
Title: SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes
Authors: Libo Sun, Jia-Wang Bian, Huangying Zhan, Wei Yin, Ian Reid, Chunhua Shen
Abstract summary: Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
Score: 58.89295356901823
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions and occlusions. Consequently, existing methods show poor accuracy in dynamic scenes, and the estimated depth map is blurred at object boundaries because they are usually occluded in other training views. In this paper, we propose SC-DepthV3 for addressing the challenges. Specifically, we introduce an external pretrained monocular depth estimation model for generating single-image depth prior, namely pseudo-depth, based on which we propose novel losses to boost self-supervised training. As a result, our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes. We demonstrate the significantly superior performance of our method over previous methods on six challenging datasets, and we provide detailed ablation studies for the proposed terms. Source code and data will be released at https://github.com/JiawangBian/sc_depth_pl

Related papers

A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation [46.037640130193566]
We propose a new method to rescale Depth Anything predictions using 3D points provided by sensors or techniques such as low-resolution LiDAR or structure-from-motion with poses given by an IMU. Our experiments highlight enhancements relative to zero-shot monocular metric depth estimation methods, competitive results compared to fine-tuned approaches and a better robustness than depth completion approaches.
arXiv Detail & Related papers (2024-12-18T17:50:15Z)
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos [50.28715151619659]
We propose a novel video-depth estimation method called Align3R to estimate temporal consistent depth maps for a dynamic video. Our key idea is to utilize the recent DUSt3R model to align estimated monocular depth maps of different timesteps. Experiments demonstrate that Align3R estimates consistent video depth and camera poses for a monocular video with superior performance than baseline methods.
arXiv Detail & Related papers (2024-12-04T07:09:59Z)
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z)
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation [23.93080319283679]
Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation. This paper proposes a self-supervised training framework exploiting pseudo depth labels for dynamic regions from training data.
arXiv Detail & Related papers (2024-04-23T10:51:15Z)
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation [20.230238670888454]
We introduce Marigold, a method for affine-invariant monocular depth estimation. It can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases.
arXiv Detail & Related papers (2023-12-04T18:59:13Z)
Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z)
Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation. The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z)
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation. We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z)
Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes [87.91841050957714]
We present an unsupervised monocular framework for dense depth estimation of dynamic scenes. We derive a training objective that aims to opportunistically preserve pairwise distances between reconstructed 3D points. Our method provides promising results, demonstrating its capability of reconstructing 3D from challenging videos of non-rigid scenes.
arXiv Detail & Related papers (2020-12-31T16:02:03Z)
Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues [24.743099160992937]
We propose a novel self-supervised joint learning framework for depth estimation. The proposed framework outperforms the state-of-the-art(SOTA) on KITTI and Make3D datasets.
arXiv Detail & Related papers (2020-06-17T13:56:59Z)
Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework. Our method produces a time series of depth maps. It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.