BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation
- URL: http://arxiv.org/abs/2204.00987v1
- Date: Sun, 3 Apr 2022 04:38:02 GMT
- Title: BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation
- Authors: Zhenyu Li, Xuyang Wang, Xianming Liu, Junjun Jiang
- Abstract summary: We present a novel framework called BinsFormer, tailored for the classification-regression-based depth estimation.
It mainly focuses on two crucial components in the specific task: 1) proper generation of adaptive bins and 2) sufficient interaction between probability distribution and bins predictions.
Experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that BinsFormer surpasses state-of-the-art monocular depth estimation methods with prominent margins.
- Score: 46.678016537618845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular depth estimation is a fundamental task in computer vision and has
drawn increasing attention. Recently, some methods reformulate it as a
classification-regression task to boost the model performance, where continuous
depth is estimated via a linear combination of predicted probability
distributions and discrete bins. In this paper, we present a novel framework
called BinsFormer, tailored for the classification-regression-based depth
estimation. It mainly focuses on two crucial components in the specific task:
1) proper generation of adaptive bins and 2) sufficient interaction between
probability distribution and bins predictions. To specify, we employ the
Transformer decoder to generate bins, novelly viewing it as a direct set-to-set
prediction problem. We further integrate a multi-scale decoder structure to
achieve a comprehensive understanding of spatial geometry information and
estimate depth maps in a coarse-to-fine manner. Moreover, an extra scene
understanding query is proposed to improve the estimation accuracy, which turns
out that models can implicitly learn useful information from an auxiliary
environment classification task. Extensive experiments on the KITTI, NYU, and
SUN RGB-D datasets demonstrate that BinsFormer surpasses state-of-the-art
monocular depth estimation methods with prominent margins. Code and pretrained
models will be made publicly available at
\url{https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox}.
Related papers
- Self-supervised Monocular Depth Estimation with Large Kernel Attention [30.44895226042849]
We propose a self-supervised monocular depth estimation network to get finer details.
Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies.
Our method achieves competitive results on the KITTI dataset.
arXiv Detail & Related papers (2024-09-26T14:44:41Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - IEBins: Iterative Elastic Bins for Monocular Depth Estimation [25.71386321706134]
We propose a novel concept of iterative elastic bins (IEBins) for the classification-regression-based MDE.
The proposed IEBins aims to search for high-quality depth by progressively optimizing the search range.
We develop a dedicated framework composed of a feature extractor and an iterative framework benefiting from the GRU-based architecture.
arXiv Detail & Related papers (2023-09-25T13:48:39Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - AdaBins: Depth Estimation using Adaptive Bins [43.07310038858445]
We propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image.
Our results show a decisive improvement over the state-of-the-art on several popular depth datasets.
arXiv Detail & Related papers (2020-11-28T14:40:45Z) - DESC: Domain Adaptation for Depth Estimation via Semantic Consistency [24.13837264978472]
We propose a domain adaptation approach to train a monocular depth estimation model.
We bridge the domain gap by leveraging semantic predictions and low-level edge features.
Our approach is evaluated on standard domain adaptation benchmarks for monocular depth estimation.
arXiv Detail & Related papers (2020-09-03T10:54:05Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.