Unsupervised Scale-consistent Depth Learning from Video
- URL: http://arxiv.org/abs/2105.11610v1
- Date: Tue, 25 May 2021 02:17:56 GMT
- Title: Unsupervised Scale-consistent Depth Learning from Video
- Authors: Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang,
Chunhua Shen, Ming-Ming Cheng, Ian Reid
- Abstract summary: We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
- Score: 131.3074342883371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a monocular depth estimator SC-Depth, which requires only
unlabelled videos for training and enables the scale-consistent prediction at
inference time. Our contributions include: (i) we propose a geometry
consistency loss, which penalizes the inconsistency of predicted depths between
adjacent views; (ii) we propose a self-discovered mask to automatically
localize moving objects that violate the underlying static scene assumption and
cause noisy signals during training; (iii) we demonstrate the efficacy of each
component with a detailed ablation study and show high-quality depth estimation
results in both KITTI and NYUv2 datasets. Moreover, thanks to the capability of
scale-consistent prediction, we show that our monocular-trained deep networks
are readily integrated into the ORB-SLAM2 system for more robust and accurate
tracking. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in
KITTI, and it generalizes well to the KAIST dataset without additional
training. Finally, we provide several demos for qualitative evaluation.
Related papers
- RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - MaskingDepth: Masked Consistency Regularization for Semi-supervised
Monocular Depth Estimation [38.09399326203952]
MaskingDepth is a novel semi-supervised learning framework for monocular depth estimation.
It enforces consistency between the strongly-augmented unlabeled data and the pseudo-labels derived from weakly-augmented unlabeled data.
arXiv Detail & Related papers (2022-12-21T06:56:22Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - On the non-universality of deep learning: quantifying the cost of
symmetry [24.86176236641865]
We prove computational limitations for learning with neural networks trained by noisy gradient descent (GD)
We characterize functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere.
Our techniques extend to gradient descent (SGD), for which we show nontrivial results for learning with fully-connected networks.
arXiv Detail & Related papers (2022-08-05T11:54:52Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Occlusion-Robust Object Pose Estimation with Holistic Representation [42.27081423489484]
State-of-the-art (SOTA) object pose estimators take a two-stage approach.
We develop a novel occlude-and-blackout batch augmentation technique.
We also develop a multi-precision supervision architecture to encourage holistic pose representation learning.
arXiv Detail & Related papers (2021-10-22T08:00:26Z) - An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements.
We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z) - Semantics-Driven Unsupervised Learning for Monocular Depth and
Ego-Motion Estimation [33.83396613039467]
We propose a semantics-driven unsupervised learning approach for monocular depth and ego-motion estimation from videos.
Recent unsupervised learning methods employ photometric errors between synthetic view and actual image as a supervision signal for training.
arXiv Detail & Related papers (2020-06-08T05:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.