Self-Supervised Joint Learning Framework of Depth Estimation via
Implicit Cues
- URL: http://arxiv.org/abs/2006.09876v3
- Date: Fri, 26 Jun 2020 07:11:18 GMT
- Title: Self-Supervised Joint Learning Framework of Depth Estimation via
Implicit Cues
- Authors: Jianrong Wang and Ge Zhang and Zhenyu Wu and XueWei Li and Li Liu
- Abstract summary: We propose a novel self-supervised joint learning framework for depth estimation.
The proposed framework outperforms the state-of-the-art(SOTA) on KITTI and Make3D datasets.
- Score: 24.743099160992937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In self-supervised monocular depth estimation, the depth discontinuity and
motion objects' artifacts are still challenging problems. Existing
self-supervised methods usually utilize a single view to train the depth
estimation network. Compared with static views, abundant dynamic properties
between video frames are beneficial to refined depth estimation, especially for
dynamic objects. In this work, we propose a novel self-supervised joint
learning framework for depth estimation using consecutive frames from monocular
and stereo videos. The main idea is using an implicit depth cue extractor which
leverages dynamic and static cues to generate useful depth proposals. These
cues can predict distinguishable motion contours and geometric scene
structures. Furthermore, a new high-dimensional attention module is introduced
to extract clear global transformation, which effectively suppresses
uncertainty of local descriptors in high-dimensional space, resulting in a more
reliable optimization in learning framework. Experiments demonstrate that the
proposed framework outperforms the state-of-the-art(SOTA) on KITTI and Make3D
datasets.
Related papers
- D$^3$epth: Self-Supervised Depth Estimation with Dynamic Mask in Dynamic Scenes [23.731667977542454]
D$3$epth is a novel method for self-supervised depth estimation in dynamic scenes.
It tackles the challenge of dynamic objects from two key perspectives.
It consistently outperforms existing self-supervised monocular depth estimation baselines.
arXiv Detail & Related papers (2024-11-07T16:07:00Z) - Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation [9.032563775151074]
Monocular depth estimation is a key technique for 3D perception in computer vision.
It faces significant challenges in real-world scenarios, which encompass adverse weather variations, motion blur, as well as scenes with poor lighting conditions at night.
We devise a novel approach to reduce over-reliance on local textures, enhancing robustness against missing or interfering patterns.
arXiv Detail & Related papers (2024-10-09T15:20:29Z) - Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation [23.93080319283679]
Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss.
Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation.
This paper proposes a self-supervised training framework exploiting pseudo depth labels for dynamic regions from training data.
arXiv Detail & Related papers (2024-04-23T10:51:15Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes [45.092076587934464]
We present Manydepth2, to achieve precise depth estimation for both dynamic objects and static backgrounds.
To tackle the challenges posed by dynamic content, we incorporate optical flow and coarse monocular depth to create a pseudo-static reference frame.
This frame is then utilized to build a motion-aware cost volume in collaboration with the vanilla target frame.
arXiv Detail & Related papers (2023-12-23T14:36:27Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature
Fusion in Dynamic Scenes [25.712707161201802]
Multi-frame methods improve monocular depth estimation over single-frame approaches.
Recent methods tend to propose complex architectures for feature matching and dynamic scenes.
We show that a simple learning framework, together with designed feature augmentation, leads to superior performance.
arXiv Detail & Related papers (2023-03-26T05:26:30Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.