ADAADepth: Adapting Data Augmentation and Attention for Self-Supervised
Monocular Depth Estimation
- URL: http://arxiv.org/abs/2103.00853v1
- Date: Mon, 1 Mar 2021 09:06:55 GMT
- Title: ADAADepth: Adapting Data Augmentation and Attention for Self-Supervised
Monocular Depth Estimation
- Authors: Vinay Kaushik, Kartik Jindgar and Brejesh Lall
- Abstract summary: We propose ADAA, utilising depth augmentation as depth supervision for learning accurate and robust depth.
We propose a relational self-attention module that learns rich contextual features and further enhances depth results.
We evaluate our predicted depth on the KITTI driving dataset and achieve state-of-the-art results.
- Score: 8.827921242078881
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised learning of depth has been a highly studied topic of research
as it alleviates the requirement of having ground truth annotations for
predicting depth. Depth is learnt as an intermediate solution to the task of
view synthesis, utilising warped photometric consistency. Although it gives
good results when trained using stereo data, the predicted depth is still
sensitive to noise, illumination changes and specular reflections. Also,
occlusion can be tackled better by learning depth from a single camera. We
propose ADAA, utilising depth augmentation as depth supervision for learning
accurate and robust depth. We propose a relational self-attention module that
learns rich contextual features and further enhances depth results. We also
optimize the auto-masking strategy across all losses by enforcing L1
regularisation over mask. Our novel progressive training strategy first learns
depth at a lower resolution and then progresses to the original resolution with
slight training. We utilise a ResNet18 encoder, learning features for
prediction of both depth and pose. We evaluate our predicted depth on the
standard KITTI driving dataset and achieve state-of-the-art results for
monocular depth estimation whilst having significantly lower number of
trainable parameters in our deep learning framework. We also evaluate our model
on Make3D dataset showing better generalization than other methods.
Related papers
- Depth Prompting for Sensor-Agnostic Depth Estimation [19.280536006736575]
We design a novel depth prompt module to allow the desirable feature representation according to new depth distributions.
Our method helps the pretrained model to be free from restraint of depth scan range and to provide absolute scale depth maps.
arXiv Detail & Related papers (2024-05-20T08:19:08Z) - Robust Depth Enhancement via Polarization Prompt Fusion Tuning [112.88371907047396]
We present a framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors.
Our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors.
To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets.
arXiv Detail & Related papers (2024-04-05T17:55:33Z) - Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation [31.34615135846137]
We propose a few-shot-based method which learns to adapt the Vision-Language Models for monocular depth estimation.
Specifically, it assigns different depth bins for different scenes, which can be selected by the model during inference.
With only one image per scene for training, our extensive experiment results on the NYU V2 and KITTI dataset demonstrate that our method outperforms the previous state-of-the-art method by up to 10.6% in terms of MARE.
arXiv Detail & Related papers (2023-11-02T06:56:50Z) - Self-Supervised Learning based Depth Estimation from Monocular Images [0.0]
The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input.
We plan to do intrinsic camera parameters during training and apply weather augmentations to further generalize our model.
arXiv Detail & Related papers (2023-04-14T07:14:08Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation.
The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z) - Geometry Uncertainty Projection Network for Monocular 3D Object
Detection [138.24798140338095]
We propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.
Specifically, a GUP module is proposed to obtains the geometry-guided uncertainty of the inferred depth.
At the training stage, we propose a Hierarchical Task Learning strategy to reduce the instability caused by error amplification.
arXiv Detail & Related papers (2021-07-29T06:59:07Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Deep feature fusion for self-supervised monocular depth prediction [7.779007880126907]
We propose a deep feature fusion method for learning self-supervised depth from scratch.
Our fusion network selects features from both upper and lower levels at every level in the encoder network.
We also propose a refinement module learning higher scale residual depth from a combination of higher level deep features and lower level residual depth.
arXiv Detail & Related papers (2020-05-16T09:42:36Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.