EGformer: Equirectangular Geometry-biased Transformer for 360 Depth
Estimation
- URL: http://arxiv.org/abs/2304.07803v2
- Date: Thu, 7 Sep 2023 05:51:15 GMT
- Title: EGformer: Equirectangular Geometry-biased Transformer for 360 Depth
Estimation
- Authors: Ilwi Yun, Chanyong Shin, Hyunku Lee, Hyuk-Jae Lee and Chae Eun Rhee
- Abstract summary: Estimating the depths of equirectangular (i.e., 360) images (EIs) is challenging given the distorted 180 x 360 field-of-view.
We propose an equirectangular geometry-biased transformer termed EGformer.
- Score: 20.42460078279734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the depths of equirectangular (i.e., 360) images (EIs) is
challenging given the distorted 180 x 360 field-of-view, which is hard to be
addressed via convolutional neural network (CNN). Although a transformer with
global attention achieves significant improvements over CNN for EI depth
estimation task, it is computationally inefficient, which raises the need for
transformer with local attention. However, to apply local attention
successfully for EIs, a specific strategy, which addresses distorted
equirectangular geometry and limited receptive field simultaneously, is
required. Prior works have only cared either of them, resulting in
unsatisfactory depths occasionally. In this paper, we propose an
equirectangular geometry-biased transformer termed EGformer. While limiting the
computational cost and the number of network parameters, EGformer enables the
extraction of the equirectangular geometry-aware local attention with a large
receptive field. To achieve this, we actively utilize the equirectangular
geometry as the bias for the local attention instead of struggling to reduce
the distortion of EIs. As compared to the most recent EI depth estimation
studies, the proposed approach yields the best depth outcomes overall with the
lowest computational cost and the fewest parameters, demonstrating the
effectiveness of the proposed methods.
Related papers
- OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images [26.37802649901314]
Oriented object detection in remote sensing images is a challenging task due to objects being distributed in multi-orientation.
We propose an end-to-end transformer-based oriented object detector consisting of three dedicated modules to address these issues.
Compared with previous end-to-end detectors, the OrientedFormer gains 1.16 and 1.21 AP$_50$ on DIOR-R and DOTA-v1.0 respectively, while reducing training epochs from 3$times$ to 1$times$.
arXiv Detail & Related papers (2024-09-29T10:36:33Z) - TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training [21.56675189346088]
We introduce Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture.
TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density.
They utilize the inherent isotropic radiation of LiDAR to enhance local representation.
Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI.
arXiv Detail & Related papers (2024-08-25T17:59:17Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - SGFormer: Spherical Geometry Transformer for 360 Depth Estimation [54.13459226728249]
Panoramic distortion poses a significant challenge in 360 depth estimation.
We propose a spherical geometry transformer, named SGFormer, to address the above issues.
We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
arXiv Detail & Related papers (2024-04-23T12:36:24Z) - ConDaFormer: Disassembled Transformer with Local Structure Enhancement
for 3D Point Cloud Understanding [105.98609765389895]
Transformers have been recently explored for 3D point cloud understanding.
A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data.
In this paper, we develop a new transformer block, named ConDaFormer.
arXiv Detail & Related papers (2023-12-18T11:19:45Z) - OcTr: Octree-based Transformer for 3D Object Detection [30.335788698814444]
A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes.
We propose an Octree-based Transformer, named OcTr, to address this issue.
For enhanced foreground perception, we propose a hybrid positional embedding, composed of the semantic-aware positional embedding and attention mask.
arXiv Detail & Related papers (2023-03-22T15:01:20Z) - URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for
Monocular Depth Estimation [24.03121823263355]
We introduce an uncertainty rectified cross-distillation between Transformer and convolutional neural network (CNN) to learn a unified depth estimator.
Specifically, we use the depth estimates from the Transformer branch and the CNN branch as pseudo labels to teach each other.
We propose a surprisingly simple yet highly effective data augmentation technique CutFlip, which enforces the model to exploit more valuable clues apart from the vertical image position for depth inference.
arXiv Detail & Related papers (2023-02-16T08:53:08Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Adaptive Surface Normal Constraint for Depth Estimation [102.7466374038784]
We introduce a simple yet effective method, named Adaptive Surface Normal (ASN) constraint, to correlate the depth estimation with geometric consistency.
Our method can faithfully reconstruct the 3D geometry and is robust to local shape variations, such as boundaries, sharp corners and noises.
arXiv Detail & Related papers (2021-03-29T10:36:25Z) - PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling [103.09504572409449]
We propose a novel deep neural network based method, called PUGeo-Net, to generate uniform dense point clouds.
Thanks to its geometry-centric nature, PUGeo-Net works well for both CAD models with sharp features and scanned models with rich geometric details.
arXiv Detail & Related papers (2020-02-24T14:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.