ACDNet: Adaptively Combined Dilated Convolution for Monocular Panorama
Depth Estimation
- URL: http://arxiv.org/abs/2112.14440v1
- Date: Wed, 29 Dec 2021 08:04:19 GMT
- Title: ACDNet: Adaptively Combined Dilated Convolution for Monocular Panorama
Depth Estimation
- Authors: Chuanqing Zhuang, Zhengda Lu, Yiqun Wang, Jun Xiao, Ying Wang
- Abstract summary: We propose an ACDNet based on the adaptively combined dilated convolution to predict the dense depth map for a monocular panoramic image.
We conduct depth estimation experiments on three datasets (both virtual and real-world) and the experimental results demonstrate that our proposed ACDNet substantially outperforms the current state-of-the-art (SOTA) methods.
- Score: 9.670696363730329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation is a crucial step for 3D reconstruction with panorama images
in recent years. Panorama images maintain the complete spatial information but
introduce distortion with equirectangular projection. In this paper, we propose
an ACDNet based on the adaptively combined dilated convolution to predict the
dense depth map for a monocular panoramic image. Specifically, we combine the
convolution kernels with different dilations to extend the receptive field in
the equirectangular projection. Meanwhile, we introduce an adaptive
channel-wise fusion module to summarize the feature maps and get diverse
attention areas in the receptive field along the channels. Due to the
utilization of channel-wise attention in constructing the adaptive channel-wise
fusion module, the network can capture and leverage the cross-channel
contextual information efficiently. Finally, we conduct depth estimation
experiments on three datasets (both virtual and real-world) and the
experimental results demonstrate that our proposed ACDNet substantially
outperforms the current state-of-the-art (SOTA) methods. Our codes and model
parameters are accessed in https://github.com/zcq15/ACDNet.
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - A Simple Baseline for Supervised Surround-view Depth Estimation [25.81521612343612]
We propose S3Depth, a Simple Baseline for Supervised Surround-view Depth Estimation.
We employ a global-to-local feature extraction module which combines CNN with transformer layers for enriched representations.
Our method achieves superior performance over existing state-of-the-art methods on both DDAD and nuScenes datasets.
arXiv Detail & Related papers (2023-03-14T10:06:19Z) - ${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface [4.649656275858966]
We propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface.
Specifically, we project the feature maps extracted from equirectangular images onto unit spherical surface sampled by uniformly distributed grids.
We propose a global cross-attention-based fusion module to fuse the feature maps from skip connection and enhance the ability to obtain global context.
arXiv Detail & Related papers (2023-01-14T07:39:15Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Neural Contourlet Network for Monocular 360 Depth Estimation [37.82642960470551]
We provide a new perspective that constructs an interpretable and sparse representation for a 360 image.
We propose a neural contourlet network consisting of a convolutional neural network and a contourlet transform branch.
In the encoder stage, we design a spatial-spectral fusion module to effectively fuse two types of cues.
arXiv Detail & Related papers (2022-08-03T02:25:55Z) - Depthformer : Multiscale Vision Transformer For Monocular Depth
Estimation With Local Global Information Fusion [6.491470878214977]
This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset.
We propose a novel attention-based architecture, Depthformer for monocular depth estimation.
Our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE)
arXiv Detail & Related papers (2022-07-10T20:49:11Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.