Related papers: IEBins: Iterative Elastic Bins for Monocular Depth Estimation

IEBins: Iterative Elastic Bins for Monocular Depth Estimation

URL: http://arxiv.org/abs/2309.14137v1
Date: Mon, 25 Sep 2023 13:48:39 GMT
Title: IEBins: Iterative Elastic Bins for Monocular Depth Estimation
Authors: Shuwei Shao, Zhongcai Pei, Xingming Wu, Zhong Liu, Weihai Chen, Zhengguo Li
Abstract summary: We propose a novel concept of iterative elastic bins (IEBins) for the classification-regression-based MDE. The proposed IEBins aims to search for high-quality depth by progressively optimizing the search range. We develop a dedicated framework composed of a feature extractor and an iterative framework benefiting from the GRU-based architecture.
Score: 25.71386321706134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular depth estimation (MDE) is a fundamental topic of geometric computer vision and a core technique for many downstream applications. Recently, several methods reframe the MDE as a classification-regression problem where a linear combination of probabilistic distribution and bin centers is used to predict depth. In this paper, we propose a novel concept of iterative elastic bins (IEBins) for the classification-regression-based MDE. The proposed IEBins aims to search for high-quality depth by progressively optimizing the search range, which involves multiple stages and each stage performs a finer-grained depth search in the target bin on top of its previous stage. To alleviate the possible error accumulation during the iterative process, we utilize a novel elastic target bin to replace the original target bin, the width of which is adjusted elastically based on the depth uncertainty. Furthermore, we develop a dedicated framework composed of a feature extractor and an iterative optimizer that has powerful temporal context modeling capabilities benefiting from the GRU-based architecture. Extensive experiments on the KITTI, NYU-Depth-v2 and SUN RGB-D datasets demonstrate that the proposed method surpasses prior state-of-the-art competitors. The source code is publicly available at https://github.com/ShuweiShao/IEBins.

Related papers

Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning [3.4174356345935393]
We propose MoDOT, a novel method that jointly estimates depth and OBs from a single image.<n>MoDOT incorporates a new module, CASM, which combines cross-attention and multi-scale strip convolutions to leverage mid-level OB features.<n>Experiments demonstrate the mutual benefits of jointly estimating depth and OBs, and validate the effectiveness of MoDOT's design.
arXiv Detail & Related papers (2025-05-27T14:15:19Z)
Depth Anything with Any Prior [64.39991799606146]
Prior Depth Anything is a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction.<n>We develop a conditioned monocular depth estimation (MDE) model to refine the inherent noise of depth priors.<n>Our model showcases impressive zero-shot generalization across depth completion, super-resolution, and inpainting over 7 real-world datasets.
arXiv Detail & Related papers (2025-05-15T17:59:50Z)
Relative Pose Estimation through Affine Corrections of Monocular Depth Priors [69.59216331861437]
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities. We propose a hybrid estimation pipeline that combines our proposed solvers with classic point-based solvers and epipolar constraints.
arXiv Detail & Related papers (2025-01-09T18:58:30Z)
Amodal Depth Anything: Amodal Depth Estimation in the Wild [39.27552294431748]
Amodal depth estimation aims to predict the depth of occluded (invisible) parts of objects in a scene. We propose a novel formulation of amodal depth estimation in the wild, focusing on relative depth prediction to improve model generalization across diverse natural images. We present two complementary frameworks: Amodal-DAV2, a deterministic model based on Depth Anything V2, and Amodal-DepthFM, a generative model that integrates conditional flow matching principles.
arXiv Detail & Related papers (2024-12-03T09:56:38Z)
Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation [41.13597666007784]
Dimensionality reduction (DR) offers a useful representation of complex high-dimensional data. Recent DR methods focus on hyperbolic geometry to derive a faithful low-dimensional representation of hierarchical data. This paper presents hGP-LVMs to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation.
arXiv Detail & Related papers (2024-10-22T05:07:30Z)
Energy-Guided Continuous Entropic Barycenter Estimation for General Costs [95.33926437521046]
We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT.
arXiv Detail & Related papers (2023-10-02T11:24:36Z)
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z)
Probabilistic partition of unity networks for high-dimensional regression problems [1.0227479910430863]
We explore the partition of unity network (PPOU-Net) model in the context of high-dimensional regression problems. We propose a general framework focusing on adaptive dimensionality reduction. The PPOU-Nets consistently outperform the baseline fully-connected neural networks of comparable sizes in numerical experiments.
arXiv Detail & Related papers (2022-10-06T06:01:36Z)
Non-parametric Depth Distribution Modelling based Depth Inference for Multi-view Stereo [43.415242967722804]
Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo. In general, those approaches assume that the depth of each pixel follows a unimodal distribution. We propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.
arXiv Detail & Related papers (2022-05-08T05:13:04Z)
BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation [46.678016537618845]
We present a novel framework called BinsFormer, tailored for the classification-regression-based depth estimation. It mainly focuses on two crucial components in the specific task: 1) proper generation of adaptive bins and 2) sufficient interaction between probability distribution and bins predictions. Experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that BinsFormer surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-04-03T04:38:02Z)
Accelerated replica exchange stochastic gradient Langevin diffusion enhanced Bayesian DeepONet for solving noisy parametric PDEs [7.337247167823921]
We propose a training framework for replica-exchange Langevin diffusion that exploits the neural network architecture of DeepONets. We show that the proposed framework's exploration and exploitation capabilities enable improved training convergence for DeepONets in noisy scenarios. We also show that replica-exchange Langeving Diffusion also improves the DeepONet's mean prediction accuracy in noisy scenarios.
arXiv Detail & Related papers (2021-11-03T19:23:59Z)
Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models. Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence) We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z)
Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection. We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z)
CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system. We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction. We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.