IEBins: Iterative Elastic Bins for Monocular Depth Estimation
- URL: http://arxiv.org/abs/2309.14137v1
- Date: Mon, 25 Sep 2023 13:48:39 GMT
- Title: IEBins: Iterative Elastic Bins for Monocular Depth Estimation
- Authors: Shuwei Shao, Zhongcai Pei, Xingming Wu, Zhong Liu, Weihai Chen,
Zhengguo Li
- Abstract summary: We propose a novel concept of iterative elastic bins (IEBins) for the classification-regression-based MDE.
The proposed IEBins aims to search for high-quality depth by progressively optimizing the search range.
We develop a dedicated framework composed of a feature extractor and an iterative framework benefiting from the GRU-based architecture.
- Score: 25.71386321706134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation (MDE) is a fundamental topic of geometric computer
vision and a core technique for many downstream applications. Recently, several
methods reframe the MDE as a classification-regression problem where a linear
combination of probabilistic distribution and bin centers is used to predict
depth. In this paper, we propose a novel concept of iterative elastic bins
(IEBins) for the classification-regression-based MDE. The proposed IEBins aims
to search for high-quality depth by progressively optimizing the search range,
which involves multiple stages and each stage performs a finer-grained depth
search in the target bin on top of its previous stage. To alleviate the
possible error accumulation during the iterative process, we utilize a novel
elastic target bin to replace the original target bin, the width of which is
adjusted elastically based on the depth uncertainty. Furthermore, we develop a
dedicated framework composed of a feature extractor and an iterative optimizer
that has powerful temporal context modeling capabilities benefiting from the
GRU-based architecture. Extensive experiments on the KITTI, NYU-Depth-v2 and
SUN RGB-D datasets demonstrate that the proposed method surpasses prior
state-of-the-art competitors. The source code is publicly available at
https://github.com/ShuweiShao/IEBins.
Related papers
- Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities.
We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z) - Amodal Depth Anything: Amodal Depth Estimation in the Wild [39.27552294431748]
Amodal depth estimation aims to predict the depth of occluded (invisible) parts of objects in a scene.
We propose a novel formulation of amodal depth estimation in the wild, focusing on relative depth prediction to improve model generalization across diverse natural images.
We present two complementary frameworks: Amodal-DAV2, a deterministic model based on Depth Anything V2, and Amodal-DepthFM, a generative model that integrates conditional flow matching principles.
arXiv Detail & Related papers (2024-12-03T09:56:38Z) - Energy-Guided Continuous Entropic Barycenter Estimation for General Costs [95.33926437521046]
We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions.
Our approach is built upon the dual reformulation of the EOT problem based on weak OT.
arXiv Detail & Related papers (2023-10-02T11:24:36Z) - Non-parametric Depth Distribution Modelling based Depth Inference for
Multi-view Stereo [43.415242967722804]
Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo.
In general, those approaches assume that the depth of each pixel follows a unimodal distribution.
We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
arXiv Detail & Related papers (2022-05-08T05:13:04Z) - BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation [46.678016537618845]
We present a novel framework called BinsFormer, tailored for the classification-regression-based depth estimation.
It mainly focuses on two crucial components in the specific task: 1) proper generation of adaptive bins and 2) sufficient interaction between probability distribution and bins predictions.
Experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that BinsFormer surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-04-03T04:38:02Z) - Accelerated replica exchange stochastic gradient Langevin diffusion
enhanced Bayesian DeepONet for solving noisy parametric PDEs [7.337247167823921]
We propose a training framework for replica-exchange Langevin diffusion that exploits the neural network architecture of DeepONets.
We show that the proposed framework's exploration and exploitation capabilities enable improved training convergence for DeepONets in noisy scenarios.
We also show that replica-exchange Langeving Diffusion also improves the DeepONet's mean prediction accuracy in noisy scenarios.
arXiv Detail & Related papers (2021-11-03T19:23:59Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.