GLPanoDepth: Global-to-Local Panoramic Depth Estimation
- URL: http://arxiv.org/abs/2202.02796v2
- Date: Tue, 8 Feb 2022 11:51:33 GMT
- Title: GLPanoDepth: Global-to-Local Panoramic Depth Estimation
- Authors: Jiayang Bai, Shuichang Lai, Haoyu Qin, Jie Guo and Yanwen Guo
- Abstract summary: We propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image.
We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals.
This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving state-of-the-art performance in panoramic depth estimation.
- Score: 18.06592473599777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a learning-based method for predicting dense depth
values of a scene from a monocular omnidirectional image. An omnidirectional
image has a full field-of-view, providing much more complete descriptions of
the scene than perspective images. However, fully-convolutional networks that
most current solutions rely on fail to capture rich global contexts from the
panorama. To address this issue and also the distortion of equirectangular
projection in the panorama, we propose Cubemap Vision Transformers (CViT), a
new transformer-based architecture that can model long-range dependencies and
extract distortion-free global features from the panorama. We show that cubemap
vision transformers have a global receptive field at every stage and can
provide globally coherent predictions for spherical signals. To preserve
important local features, we further design a convolution-based branch in our
pipeline (dubbed GLPanoDepth) and fuse global features from cubemap vision
transformers at multiple scales. This global-to-local strategy allows us to
fully exploit useful global and local features in the panorama, achieving
state-of-the-art performance in panoramic depth estimation.
Related papers
- SGFormer: Spherical Geometry Transformer for 360 Depth Estimation [54.13459226728249]
Panoramic distortion poses a significant challenge in 360 depth estimation.
We propose a spherical geometry transformer, named SGFormer, to address the above issues.
We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
arXiv Detail & Related papers (2024-04-23T12:36:24Z) - Global Latent Neural Rendering [4.826483125482717]
A recent trend among generalizable novel view methods is to learn a rendering operator acting over single camera rays.
Here, we propose to learn a global rendering operator acting over all camera rays jointly.
We introduce our Convolutional Global Latent Renderer (ConvGLR), an efficient convolutional architecture that performs the rendering operation globally in a low-resolution latent space.
arXiv Detail & Related papers (2023-12-13T18:14:13Z) - Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting
Prediction [28.180205012351802]
Predicting panoramic indoor lighting from a single perspective image is a fundamental but highly ill-posed problem in computer vision and graphics.
Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama.
We propose a local-to-global strategy for large-scale panorama inpainting.
arXiv Detail & Related papers (2023-03-18T06:18:49Z) - ${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface [4.649656275858966]
We propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface.
Specifically, we project the feature maps extracted from equirectangular images onto unit spherical surface sampled by uniformly distributed grids.
We propose a global cross-attention-based fusion module to fuse the feature maps from skip connection and enhance the ability to obtain global context.
arXiv Detail & Related papers (2023-01-14T07:39:15Z) - PanoViT: Vision Transformer for Room Layout Estimation from a Single
Panoramic Image [11.053777620735175]
PanoViT is a panorama vision transformer to estimate the room layout from a single panoramic image.
Compared to CNN models, our PanoViT is more proficient in learning global information from the panoramic image.
Our method outperforms state-of-the-art solutions in room layout prediction accuracy.
arXiv Detail & Related papers (2022-12-23T05:37:11Z) - PanoFormer: Panorama Transformer for Indoor 360{\deg} Depth Estimation [35.698249161263966]
Existing panoramic depth estimation methods based on convolutional neural networks (CNNs) focus on removing panoramic distortions.
This paper proposes the panorama transformer (named PanoFormer) to estimate the depth in panorama images.
In particular, we divide patches on the spherical tangent domain into tokens to reduce the negative effect of panoramic distortions.
arXiv Detail & Related papers (2022-03-17T12:19:43Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - Unifying Global-Local Representations in Salient Object Detection with Transformer [55.23033277636774]
We introduce a new attention-based encoder, vision transformer, into salient object detection.
With the global view in very shallow layers, the transformer encoder preserves more local representations.
Our method significantly outperforms other FCN-based and transformer-based methods in five benchmarks.
arXiv Detail & Related papers (2021-08-05T17:51:32Z) - Vision Transformers for Dense Prediction [77.34726150561087]
We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Our experiments show that this architecture yields substantial improvements on dense prediction tasks.
arXiv Detail & Related papers (2021-03-24T18:01:17Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.