SGFormer: Spherical Geometry Transformer for 360 Depth Estimation
- URL: http://arxiv.org/abs/2404.14979v2
- Date: Tue, 08 Oct 2024 03:09:38 GMT
- Title: SGFormer: Spherical Geometry Transformer for 360 Depth Estimation
- Authors: Junsong Zhang, Zisong Chen, Chunyu Lin, Lang Nie, Zhijie Shen, Kang Liao, Yao Zhao,
- Abstract summary: Panoramic distortion poses a significant challenge in 360 depth estimation.
We propose a spherical geometry transformer, named SGFormer, to address the above issues.
We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
- Score: 54.13459226728249
- License:
- Abstract: Panoramic distortion poses a significant challenge in 360 depth estimation, particularly pronounced at the north and south poles. Existing methods either adopt a bi-projection fusion strategy to remove distortions or model long-range dependencies to capture global structures, which can result in either unclear structure or insufficient local perception. In this paper, we propose a spherical geometry transformer, named SGFormer, to address the above issues, with an innovative step to integrate spherical geometric priors into vision transformers. To this end, we retarget the transformer decoder to a spherical prior decoder (termed SPDecoder), which endeavors to uphold the integrity of spherical structures during decoding. Concretely, we leverage bipolar re-projection, circular rotation, and curve local embedding to preserve the spherical characteristics of equidistortion, continuity, and surface distance, respectively. Furthermore, we present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions. It not only boosts the global perception of spatial position but also sharpens the depth structure across different patches. Finally, we conduct extensive experiments on popular benchmarks, demonstrating our superiority over state-of-the-art solutions.
Related papers
- SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception [61.7243424157871]
We introduce a transformer-based architecture that, by incorporating a novel Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360$degree$ perception benchmarks for depth estimation and semantic segmentation.
arXiv Detail & Related papers (2024-12-09T20:23:10Z) - ESCAPE: Equivariant Shape Completion via Anchor Point Encoding [79.59829525431238]
We introduce ESCAPE, a framework designed to achieve rotation-equivariant shape completion.
ESCAPE employs a distinctive encoding strategy by selecting anchor points from a shape and representing all points as a distance to all anchor points.
ESCAPE achieves robust, high-quality reconstructions across arbitrary rotations and translations.
arXiv Detail & Related papers (2024-12-01T20:05:14Z) - Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective [89.53522682640337]
We propose an oriented distortion-aware Gabor Fusion framework (PGFuse) to address the above challenges.
To address the reintroduced distortions, we design a linear latitude-aware distortion representation method to generate customized, distortion-aware Gabor filters.
Considering the orientation sensitivity of the Gabor transform, we introduce a spherical gradient constraint to stabilize this sensitivity.
arXiv Detail & Related papers (2024-08-29T02:58:35Z) - Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes [12.506628755166814]
We propose novel convolution operators, termed Twin Deformable point Convolutions (TDConvs)
These operators aim to achieve adaptive feature learning by learning deformable sampling points in the latitude-longitude plane and altitude direction.
Experiments on existing popular benchmarks conclude that our TDConvs achieve the best segmentation performance.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - PanoNormal: Monocular Indoor 360° Surface Normal Estimation [12.992217830651988]
textitPanoNormal is a monocular surface normal estimation architecture designed for 360deg images.
We employ a multi-level global self-attention scheme with the consideration of the spherical feature distribution.
Our results demonstrate that our approach achieves state-of-the-art performance across multiple popular 360deg monocular datasets.
arXiv Detail & Related papers (2024-05-29T04:07:14Z) - CRF360D: Monocular 360 Depth Estimation via Spherical Fully-Connected CRFs [5.854176164327896]
Monocular 360 depth estimation is challenging due to the inherent distortion of the equirectangular projection (ERP) plane.
In this paper, we propose spherical fully-connected CRFs (SF-CRFs)
SF-CRFs enjoy two key components. Firstly, to involve sufficient spherical neighbors, we propose a Spherical Window Transform (SWT) module.
This module aims to replicate the equator window's spherical relationships to all other windows, leveraging the rotational invariance of the sphere.
Remarkably, the transformation process is highly efficient, completing the transformation of all windows in a 512
arXiv Detail & Related papers (2024-05-19T14:29:06Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.