OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- URL: http://arxiv.org/abs/2203.00838v1
- Date: Wed, 2 Mar 2022 03:19:49 GMT
- Title: OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- Authors: Yuyan Li, Yuliang Guo, Zhixin Yan, Xinyu Huang, Ye Duan, Liu Ren
- Abstract summary: We propose a 360 monocular depth estimation pipeline, textit OmniFusion, to tackle the spherical distortion issue.
Our pipeline transforms a 360 image into less-distorted perspective patches (i.e. tangent images) to obtain patch-wise predictions via CNN, and then merge the patch-wise results for final output.
Experiments show that our method greatly mitigates the distortion issue, and achieves state-of-the-art performances on several 360 monocular depth estimation benchmark datasets.
- Score: 12.058261716065381
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A well-known challenge in applying deep-learning methods to omnidirectional
images is spherical distortion. In dense regression tasks such as depth
estimation, where structural details are required, using a vanilla CNN layer on
the distorted 360 image results in undesired information loss. In this paper,
we propose a 360 monocular depth estimation pipeline, \textit{OmniFusion}, to
tackle the spherical distortion issue. Our pipeline transforms a 360 image into
less-distorted perspective patches (i.e. tangent images) to obtain patch-wise
predictions via CNN, and then merge the patch-wise results for final output. To
handle the discrepancy between patch-wise predictions which is a major issue
affecting the merging quality, we propose a new framework with the following
key components. First, we propose a geometry-aware feature fusion mechanism
that combines 3D geometric features with 2D image features to compensate for
the patch-wise discrepancy. Second, we employ the self-attention-based
transformer architecture to conduct a global aggregation of patch-wise
information, which further improves the consistency. Last, we introduce an
iterative depth refinement mechanism, to further refine the estimated depth
based on the more accurate geometric features. Experiments show that our method
greatly mitigates the distortion issue, and achieves state-of-the-art
performances on several 360 monocular depth estimation benchmark datasets.
Related papers
- MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors [24.753860375872215]
This paper presents a Transformer-based monocular 3D object detection method called MonoDGP.
It adopts perspective-invariant geometry errors to modify the projection formula.
Our method demonstrates state-of-the-art performance on the KITTI benchmark without extra data.
arXiv Detail & Related papers (2024-10-25T14:31:43Z) - Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation [6.832852988957967]
We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively.
Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels.
We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy.
arXiv Detail & Related papers (2024-06-18T17:59:31Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation.
We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view.
We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z) - DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z) - Neural Contourlet Network for Monocular 360 Depth Estimation [37.82642960470551]
We provide a new perspective that constructs an interpretable and sparse representation for a 360 image.
We propose a neural contourlet network consisting of a convolutional neural network and a contourlet transform branch.
In the encoder stage, we design a spatial-spectral fusion module to effectively fuse two types of cues.
arXiv Detail & Related papers (2022-08-03T02:25:55Z) - Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement [47.61748619439693]
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints.
Previous works denoise a point cloud textita posteriori after projecting the imperfect depth data onto 3D space.
We enhance depth measurements directly on the sensed images textita priori, before synthesizing a 3D point cloud.
arXiv Detail & Related papers (2021-11-09T04:17:35Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.