SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic
Segmentation
- URL: http://arxiv.org/abs/2306.03403v2
- Date: Tue, 12 Mar 2024 12:41:04 GMT
- Title: SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic
Segmentation
- Authors: Xuewei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li
- Abstract summary: PAnoramic Semantic (PASS) gives complete scene perception based on an ultra-wide angle of view.
Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original $360circ$ data.
We propose Spherical Geometry-Aware Transformer for PAnoramic Semantic (SGAT4PASS) to be more robust to 3D disturbance.
- Score: 53.5256153325136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As an important and challenging problem in computer vision, PAnoramic
Semantic Segmentation (PASS) gives complete scene perception based on an
ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic
image input focus on solving image distortions but lack consideration of the 3D
properties of original $360^{\circ}$ data. Therefore, their performance will
drop a lot when inputting panoramic images with the 3D disturbance. To be more
robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer
for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical
geometry knowledge. Specifically, a spherical geometry-aware framework is
proposed for PASS. It includes three modules, i.e., spherical geometry-aware
image projection, spherical deformable patch embedding, and a panorama-aware
loss, which takes input images with 3D disturbance into account, adds a
spherical geometry-aware constraint on the existing deformable patch embedding,
and indicates the pixel density of original $360^{\circ}$ data, respectively.
Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS
significantly improves performance and robustness, with approximately a 2%
increase in mIoU, and when small 3D disturbances occur in the data, the
stability of our performance is improved by an order of magnitude. Our code and
supplementary material are available at
https://github.com/TencentARC/SGAT4PASS.
Related papers
- PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection [13.60524473223155]
This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects.
PointAD renders 3D anomalies into multiple 2D renderings and projects them back into 3D space.
Our model can directly integrate RGB information, further enhancing the understanding of 3D anomalies in a plug-and-play manner.
arXiv Detail & Related papers (2024-10-01T01:40:22Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance
Fields using Geometry-Guided Text-to-Image Diffusion Model [39.64952340472541]
We propose a controllable text-to-3D avatar generation method whose facial expression is controllable.
Our main strategy is to construct the 3D avatar in Neural Radiance Fields (NeRF) optimized with a set of controlled viewpoint-aware images.
We demonstrate the empirical results and discuss the effectiveness of our method.
arXiv Detail & Related papers (2023-09-07T08:14:46Z) - Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
and 3D Diffusion Priors [104.79392615848109]
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes from a single unposed image.
In the first stage, we optimize a neural radiance field to produce a coarse geometry.
In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture.
arXiv Detail & Related papers (2023-06-30T17:59:08Z) - Shape-Net: Room Layout Estimation from Panoramic Images Robust to
Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs [0.0]
We propose a method for distilling knowledge from a model trained with both images and 3D information to a model that takes only images as input.
The proposed model, which is called Shape-Net, achieves state-of-the-art (SOTA) performance on benchmark datasets.
arXiv Detail & Related papers (2023-04-25T07:45:43Z) - Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape [77.95154911528365]
3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori.
Previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry.
This paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person.
arXiv Detail & Related papers (2022-04-09T03:46:18Z) - Unsupervised Learning of Fine Structure Generation for 3D Point Clouds
by 2D Projection Matching [66.98712589559028]
We propose an unsupervised approach for 3D point cloud generation with fine structures.
Our method can recover fine 3D structures from 2D silhouette images at different resolutions.
arXiv Detail & Related papers (2021-08-08T22:15:31Z) - Learning geometry-image representation for 3D point cloud generation [5.3485743892868545]
We propose a novel geometry image based generator (GIG) to convert the 3D point cloud generation problem to a 2D geometry image generation problem.
Experiments on both rigid and non-rigid 3D object datasets have demonstrated the promising performance of our method.
arXiv Detail & Related papers (2020-11-29T05:21:10Z) - Hard Example Generation by Texture Synthesis for Cross-domain Shape
Similarity Learning [97.56893524594703]
Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database.
metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning.
We develop a geometry-focused multi-view metric learning framework empowered by texture synthesis.
arXiv Detail & Related papers (2020-10-23T08:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.