SGFormer: Spherical Geometry Transformer for 360 Depth Estimation
- URL: http://arxiv.org/abs/2404.14979v2
- Date: Tue, 08 Oct 2024 03:09:38 GMT
- Title: SGFormer: Spherical Geometry Transformer for 360 Depth Estimation
- Authors: Junsong Zhang, Zisong Chen, Chunyu Lin, Lang Nie, Zhijie Shen, Kang Liao, Yao Zhao,
- Abstract summary: Panoramic distortion poses a significant challenge in 360 depth estimation.
We propose a spherical geometry transformer, named SGFormer, to address the above issues.
We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
- Score: 54.13459226728249
- License:
- Abstract: Panoramic distortion poses a significant challenge in 360 depth estimation, particularly pronounced at the north and south poles. Existing methods either adopt a bi-projection fusion strategy to remove distortions or model long-range dependencies to capture global structures, which can result in either unclear structure or insufficient local perception. In this paper, we propose a spherical geometry transformer, named SGFormer, to address the above issues, with an innovative step to integrate spherical geometric priors into vision transformers. To this end, we retarget the transformer decoder to a spherical prior decoder (termed SPDecoder), which endeavors to uphold the integrity of spherical structures during decoding. Concretely, we leverage bipolar re-projection, circular rotation, and curve local embedding to preserve the spherical characteristics of equidistortion, continuity, and surface distance, respectively. Furthermore, we present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions. It not only boosts the global perception of spatial position but also sharpens the depth structure across different patches. Finally, we conduct extensive experiments on popular benchmarks, demonstrating our superiority over state-of-the-art solutions.
Related papers
- Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective [33.85582959047852]
We propose an oriented distortion-aware Gabor Fusion framework (PGFuse) to address the above challenges.
To address the reintroduced distortions, we design a linear latitude-aware distortion representation method to generate customized, distortion-aware Gabor filters.
Considering the orientation sensitivity of the Gabor transform, we introduce a spherical gradient constraint to stabilize this sensitivity.
arXiv Detail & Related papers (2024-08-29T02:58:35Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes [12.506628755166814]
We propose novel convolution operators, termed Twin Deformable point Convolutions (TDConvs)
These operators aim to achieve adaptive feature learning by learning deformable sampling points in the latitude-longitude plane and altitude direction.
Experiments on existing popular benchmarks conclude that our TDConvs achieve the best segmentation performance.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - PanoNormal: Monocular Indoor 360° Surface Normal Estimation [12.992217830651988]
textitPanoNormal is a monocular surface normal estimation architecture designed for 360deg images.
We employ a multi-level global self-attention scheme with the consideration of the spherical feature distribution.
Our results demonstrate that our approach achieves state-of-the-art performance across multiple popular 360deg monocular datasets.
arXiv Detail & Related papers (2024-05-29T04:07:14Z) - CRF360D: Monocular 360 Depth Estimation via Spherical Fully-Connected CRFs [5.854176164327896]
Monocular 360 depth estimation is challenging due to the inherent distortion of the equirectangular projection (ERP) plane.
In this paper, we propose spherical fully-connected CRFs (SF-CRFs)
SF-CRFs enjoy two key components. Firstly, to involve sufficient spherical neighbors, we propose a Spherical Window Transform (SWT) module.
This module aims to replicate the equator window's spherical relationships to all other windows, leveraging the rotational invariance of the sphere.
Remarkably, the transformation process is highly efficient, completing the transformation of all windows in a 512
arXiv Detail & Related papers (2024-05-19T14:29:06Z) - T-Pixel2Mesh: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image [84.08705684778666]
We propose a novel Transformer-boosted architecture, named T-Pixel2Mesh, inspired by the coarse-to-fine approach of P2M.
Specifically, we use a global Transformer to control the holistic shape and a local Transformer to refine the local geometry details.
Our experiments on ShapeNet demonstrate state-of-the-art performance, while results on real-world data show the generalization capability.
arXiv Detail & Related papers (2024-03-20T15:14:22Z) - OcTr: Octree-based Transformer for 3D Object Detection [30.335788698814444]
A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes.
We propose an Octree-based Transformer, named OcTr, to address this issue.
For enhanced foreground perception, we propose a hybrid positional embedding, composed of the semantic-aware positional embedding and attention mask.
arXiv Detail & Related papers (2023-03-22T15:01:20Z) - Neural Convolutional Surfaces [59.172308741945336]
This work is concerned with a representation of shapes that disentangles fine, local and possibly repeating geometry, from global, coarse structures.
We show that this approach achieves better neural shape compression than the state of the art, as well as enabling manipulation and transfer of shape details.
arXiv Detail & Related papers (2022-04-05T15:40:11Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.