SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
- URL: http://arxiv.org/abs/2412.06968v1
- Date: Mon, 09 Dec 2024 20:23:10 GMT
- Title: SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
- Authors: Yaniv Benny, Lior Wolf,
- Abstract summary: We introduce a transformer-based architecture that, by incorporating a novel Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360$degree$ perception benchmarks for depth estimation and semantic segmentation.
- Score: 61.7243424157871
- License:
- Abstract: This paper proposes a novel method for omnidirectional 360$\degree$ perception. Most common previous methods relied on equirectangular projection. This representation is easily applicable to 2D operation layers but introduces distortions into the image. Other methods attempted to remove the distortions by maintaining a sphere representation but relied on complicated convolution kernels that failed to show competitive results. In this work, we introduce a transformer-based architecture that, by incorporating a novel ``Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360$\degree$ perception benchmarks for depth estimation and semantic segmentation.
Related papers
- Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective [89.53522682640337]
We propose an oriented distortion-aware Gabor Fusion framework (PGFuse) to address the above challenges.
To address the reintroduced distortions, we design a linear latitude-aware distortion representation method to generate customized, distortion-aware Gabor filters.
Considering the orientation sensitivity of the Gabor transform, we introduce a spherical gradient constraint to stabilize this sensitivity.
arXiv Detail & Related papers (2024-08-29T02:58:35Z) - Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations [3.8240176158734194]
We propose a method of estimating the depth of monocular panoramic image with a teacher-student model fusing equirectangular and spherical representations.
In experiments, the proposed method is tested on several well-known 360 monocular depth estimation benchmark datasets.
arXiv Detail & Related papers (2024-05-27T06:11:16Z) - SGFormer: Spherical Geometry Transformer for 360 Depth Estimation [54.13459226728249]
Panoramic distortion poses a significant challenge in 360 depth estimation.
We propose a spherical geometry transformer, named SGFormer, to address the above issues.
We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
arXiv Detail & Related papers (2024-04-23T12:36:24Z) - Neural Contourlet Network for Monocular 360 Depth Estimation [37.82642960470551]
We provide a new perspective that constructs an interpretable and sparse representation for a 360 image.
We propose a neural contourlet network consisting of a convolutional neural network and a contourlet transform branch.
In the encoder stage, we design a spatial-spectral fusion module to effectively fuse two types of cues.
arXiv Detail & Related papers (2022-08-03T02:25:55Z) - Neural Convolutional Surfaces [59.172308741945336]
This work is concerned with a representation of shapes that disentangles fine, local and possibly repeating geometry, from global, coarse structures.
We show that this approach achieves better neural shape compression than the state of the art, as well as enabling manipulation and transfer of shape details.
arXiv Detail & Related papers (2022-04-05T15:40:11Z) - Pseudocylindrical Convolutions for Learned Omnidirectional Image
Compression [42.15877732557837]
We make one of the first attempts to learn deep neural networks for omnidirectional image compression.
Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution.
Experimental results show that our method consistently achieves better rate-distortion performance than competing methods.
arXiv Detail & Related papers (2021-12-25T12:18:32Z) - Concentric Spherical GNN for 3D Representation Learning [53.45704095146161]
We propose a novel multi-resolution convolutional architecture for learning over concentric spherical feature maps.
Our hierarchical architecture is based on alternatively learning to incorporate both intra-sphere and inter-sphere information.
We demonstrate the effectiveness of our approach in improving state-of-the-art performance on 3D classification tasks with rotated data.
arXiv Detail & Related papers (2021-03-18T19:05:04Z) - Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs.
We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z) - Rotation-Invariant Autoencoders for Signals on Spheres [10.406659081400354]
We study the problem of unsupervised learning of rotation-invariant representations for spherical images.
In particular, we design an autoencoder architecture consisting of $S2$ and $SO(3)$ convolutional layers.
Experiments on multiple datasets demonstrate the usefulness of the learned representations on clustering, retrieval and classification applications.
arXiv Detail & Related papers (2020-12-08T15:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.