Self-supervised Monocular Trained Depth Estimation using Self-attention
and Discrete Disparity Volume
- URL: http://arxiv.org/abs/2003.13951v1
- Date: Tue, 31 Mar 2020 04:48:16 GMT
- Title: Self-supervised Monocular Trained Depth Estimation using Self-attention
and Discrete Disparity Volume
- Authors: Adrian Johnston and Gustavo Carneiro
- Abstract summary: We propose two new ideas to improve self-supervised monocular trained depth estimation: 1) self-attention, and 2) discrete disparity prediction.
We show that the extension of the state-of-the-art self-supervised monocular trained depth estimator Monodepth2 with these two ideas allows us to design a model that produces the best results in the field in KITTI 2015 and Make3D.
- Score: 19.785343302320918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation has become one of the most studied applications in
computer vision, where the most accurate approaches are based on fully
supervised learning models. However, the acquisition of accurate and large
ground truth data sets to model these fully supervised methods is a major
challenge for the further development of the area. Self-supervised methods
trained with monocular videos constitute one the most promising approaches to
mitigate the challenge mentioned above due to the wide-spread availability of
training data. Consequently, they have been intensively studied, where the main
ideas explored consist of different types of model architectures, loss
functions, and occlusion masks to address non-rigid motion. In this paper, we
propose two new ideas to improve self-supervised monocular trained depth
estimation: 1) self-attention, and 2) discrete disparity prediction. Compared
with the usual localised convolution operation, self-attention can explore a
more general contextual information that allows the inference of similar
disparity values at non-contiguous regions of the image. Discrete disparity
prediction has been shown by fully supervised methods to provide a more robust
and sharper depth estimation than the more common continuous disparity
prediction, besides enabling the estimation of depth uncertainty. We show that
the extension of the state-of-the-art self-supervised monocular trained depth
estimator Monodepth2 with these two ideas allows us to design a model that
produces the best results in the field in KITTI 2015 and Make3D, closing the
gap with respect self-supervised stereo training and fully supervised
approaches.
Related papers
- Consistency Regularisation for Unsupervised Domain Adaptation in Monocular Depth Estimation [15.285720572043678]
We formulate unsupervised domain adaptation for monocular depth estimation as a consistency-based semi-supervised learning problem.
We introduce a pairwise loss function that regularises predictions on the source domain while enforcing consistency across multiple augmented views.
In our experiments, we rely on the standard depth estimation benchmarks KITTI and NYUv2 to demonstrate state-of-the-art results.
arXiv Detail & Related papers (2024-05-27T23:32:06Z) - Sparse Depth-Guided Attention for Accurate Depth Completion: A
Stereo-Assisted Monitored Distillation Approach [7.902840502973506]
We introduce a stereo-based model as a teacher model to improve the accuracy of the student model for depth completion.
To provide self-supervised information, we also employ multi-view depth consistency and multi-scale minimum reprojection.
arXiv Detail & Related papers (2023-03-28T09:23:19Z) - Self-Supervised Monocular Depth Estimation with Self-Reference
Distillation and Disparity Offset Refinement [15.012694052674899]
We propose two novel ideas to improve self-supervised monocular depth estimation.
We use a parameter-optimized model as the teacher updated as the training epochs to provide additional supervision.
We leverage the contextual consistency between high-scale and low-scale features to obtain multiscale disparity offsets.
arXiv Detail & Related papers (2023-02-20T06:28:52Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Image Masking for Robust Self-Supervised Monocular Depth Estimation [12.435468563991174]
Self-supervised monocular depth estimation is a salient task for 3D scene understanding.
We propose MIMDepth, a method that adapts masked image modeling for self-supervised monocular depth estimation.
arXiv Detail & Related papers (2022-10-05T15:57:53Z) - Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation [88.8963330073454]
We propose a novel monocular 6D pose estimation approach by means of self-supervised learning.
We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model.
Our proposed self-supervision outperforms all other methods relying on synthetic data.
arXiv Detail & Related papers (2022-03-19T15:12:06Z) - Pseudo Supervised Monocular Depth Estimation with Teacher-Student
Network [90.20878165546361]
We propose a new unsupervised depth estimation method based on pseudo supervision mechanism.
It strategically integrates the advantages of supervised and unsupervised monocular depth estimation.
Our experimental results demonstrate that the proposed method outperforms the state-of-the-art on the KITTI benchmark.
arXiv Detail & Related papers (2021-10-22T01:08:36Z) - Excavating the Potential Capacity of Self-Supervised Monocular Depth
Estimation [10.620856690388376]
We show that the potential capacity of self-supervised monocular depth estimation can be excavated without increasing this cost.
Our contributions can bring significant performance improvement to the baseline with even less computational overhead.
Our model, named EPCDepth, surpasses the previous state-of-the-art methods even those supervised by additional constraints.
arXiv Detail & Related papers (2021-09-26T03:40:56Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - How Well Do Self-Supervised Models Transfer? [92.16372657233394]
We evaluate the transfer performance of 13 top self-supervised models on 40 downstream tasks.
We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition.
No single self-supervised method dominates overall, suggesting that universal pre-training is still unsolved.
arXiv Detail & Related papers (2020-11-26T16:38:39Z) - On the uncertainty of self-supervised monocular depth estimation [52.13311094743952]
Self-supervised paradigms for monocular depth estimation are very appealing since they do not require ground truth annotations at all.
We explore for the first time how to estimate the uncertainty for this task and how this affects depth accuracy.
We propose a novel peculiar technique specifically designed for self-supervised approaches.
arXiv Detail & Related papers (2020-05-13T09:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.