UniFuse: Unidirectional Fusion for 360$^{\circ}$ Panorama Depth
Estimation
- URL: http://arxiv.org/abs/2102.03550v1
- Date: Sat, 6 Feb 2021 10:01:09 GMT
- Title: UniFuse: Unidirectional Fusion for 360$^{\circ}$ Panorama Depth
Estimation
- Authors: Hualie Jiang, Zhe Sheng, Siyu Zhu, Zilong Dong, Rui Huang
- Abstract summary: This paper introduces a new framework to fuse features from the two projections, unidirectionally feeding the cubemap features to the equirectangular features only at the decoding stage.
Experiments verify the effectiveness of our proposed fusion strategy and module, and our model achieves state-of-the-art performance on four popular datasets.
- Score: 11.680475784102308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning depth from spherical panoramas is becoming a popular research topic
because a panorama has a full field-of-view of the environment and provides a
relatively complete description of a scene. However, applying well-studied CNNs
for perspective images to the standard representation of spherical panoramas,
i.e., the equirectangular projection, is suboptimal, as it becomes distorted
towards the poles. Another representation is the cubemap projection, which is
distortion-free but discontinued on edges and limited in the field-of-view.
This paper introduces a new framework to fuse features from the two
projections, unidirectionally feeding the cubemap features to the
equirectangular features only at the decoding stage. Unlike the recent
bidirectional fusion approach operating at both the encoding and decoding
stages, our fusion scheme is much more efficient. Besides, we also designed a
more effective fusion module for our fusion scheme. Experiments verify the
effectiveness of our proposed fusion strategy and module, and our model
achieves state-of-the-art performance on four popular datasets. Additional
experiments show that our model also has the advantages of model complexity and
generalization capability.
Related papers
- Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - A Multi-modal Garden Dataset and Hybrid 3D Dense Reconstruction
Framework Based on Panoramic Stereo Images for a Trimming Robot [7.248231584821008]
Our proposed solution is based on a newly-designed panoramic stereo camera along with a hybrid novel software framework that consists of three fusion modules.
In the disparity fusion module, rectified stereo images produce the initial disparity maps using multiple stereo vision algorithms.
The pose fusion module adopts a two-stage global-coarse-to-local-fine strategy.
In the volumetric fusion module, the global poses of all the nodes are used to integrate the single-view point clouds into the volume.
arXiv Detail & Related papers (2023-05-10T16:15:16Z) - Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z) - Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named.
PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z) - ACDNet: Adaptively Combined Dilated Convolution for Monocular Panorama
Depth Estimation [9.670696363730329]
We propose an ACDNet based on the adaptively combined dilated convolution to predict the dense depth map for a monocular panoramic image.
We conduct depth estimation experiments on three datasets (both virtual and real-world) and the experimental results demonstrate that our proposed ACDNet substantially outperforms the current state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2021-12-29T08:04:19Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape
Recognition [38.540048855119004]
We propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.
The core component of LATFormer is a module named Locality-Aware Fusion (LAF) which integrates the local features of correlated regions across the two modalities.
In our LATFormer, we utilize the LAF module to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically to obtain more informative features.
arXiv Detail & Related papers (2021-09-03T03:23:27Z) - RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR
Point Cloud Segmentation [28.494690309193068]
We propose a novel range-point-voxel fusion network, namely RPVNet.
In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views.
By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient.
arXiv Detail & Related papers (2021-03-24T04:24:12Z) - Multi-Scale Boosted Dehazing Network with Dense Feature Fusion [92.92572594942071]
We propose a Multi-Scale Boosted Dehazing Network with Dense Feature Fusion based on the U-Net architecture.
We show that the proposed model performs favorably against the state-of-the-art approaches on the benchmark datasets as well as real-world hazy images.
arXiv Detail & Related papers (2020-04-28T09:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.