SO3UFormer: Learning Intrinsic Spherical Features for Rotation-Robust Panoramic Segmentation
- URL: http://arxiv.org/abs/2602.22867v1
- Date: Thu, 26 Feb 2026 11:07:51 GMT
- Title: SO3UFormer: Learning Intrinsic Spherical Features for Rotation-Robust Panoramic Segmentation
- Authors: Qinfeng Zhu, Yunxi Jiang, Lei Fan,
- Abstract summary: Panoramic semantic segmentation models are typically trained under a strict gravity-aligned assumption.<n>Real-world captures often deviate from this canonical orientation due to unconstrained camera motions.<n>This discrepancy causes standard spherical Transformers to overfit global latitude cues, leading to performance collapse under 3D reorientations.<n>We introduce SO3UFormer, a rotation-robust architecture designed to learn intrinsic spherical features that are less sensitive to the underlying coordinate frame.
- Score: 1.6571781613404601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoramic semantic segmentation models are typically trained under a strict gravity-aligned assumption. However, real-world captures often deviate from this canonical orientation due to unconstrained camera motions, such as the rotational jitter of handheld devices or the dynamic attitude shifts of aerial platforms. This discrepancy causes standard spherical Transformers to overfit global latitude cues, leading to performance collapse under 3D reorientations. To address this, we introduce SO3UFormer, a rotation-robust architecture designed to learn intrinsic spherical features that are less sensitive to the underlying coordinate frame. Our approach rests on three geometric pillars: (1) an intrinsic feature formulation that decouples the representation from the gravity vector by removing absolute latitude encoding; (2) quadrature-consistent spherical attention that accounts for non-uniform sampling densities; and (3) a gauge-aware relative positional mechanism that encodes local angular geometry using tangent-plane projected angles and discrete gauge pooling, avoiding reliance on global axes. We further use index-based spherical resampling together with a logit-level SO(3)-consistency regularizer during training. To rigorously benchmark robustness, we introduce Pose35, a dataset variant of Stanford2D3D perturbed by random rotations within $\pm 35^\circ$. Under the extreme test of arbitrary full SO(3) rotations, existing SOTAs fail catastrophically: the baseline SphereUFormer drops from 67.53 mIoU to 25.26 mIoU. In contrast, SO3UFormer demonstrates remarkable stability, achieving 72.03 mIoU on Pose35 and retaining 70.67 mIoU under full SO(3) rotations.
Related papers
- Enhancing Rotation-Invariant 3D Learning with Global Pose Awareness and Attention Mechanisms [30.575822636142956]
We introduce the Shadow-informed Pose Feature (SiPF), which augments local RI descriptors with a globally consistent reference point (referred to as the'shadow') derived from a learned shared rotation.<n>We also propose Rotation-invariant Attention Convolution (RIAttnConv), an attention-based operator that integrates SiPFs into the feature aggregation process.
arXiv Detail & Related papers (2025-11-11T23:01:28Z) - Correspondence-Free Fast and Robust Spherical Point Pattern Registration [0.8287206589886879]
We introduce three novel algorithms for estimation between two spherical patterns.<n>Our algorithms are over 10x faster and over 10x more accurate than current state-of-the-art methods for the Wahba problem with outliers.
arXiv Detail & Related papers (2025-08-04T12:21:05Z) - 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions.
We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression.
Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z) - RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis [15.42293045246587]
RIDE is a pioneering exploration of Rotation-Invariance for the 3D LiDAR-point-based object DEtector.
We design a bi-feature extractor that extracts (i) object-aware features though sensitive to rotation but preserve geometry well, and (ii) rotation-invariant features, which lose geometric information to a certain extent but are robust to rotation.
Our RIDE is compatible and easy to plug into the existing one-stage and two-stage 3D detectors, and boosts both detection performance and rotation robustness.
arXiv Detail & Related papers (2024-08-28T08:53:33Z) - SGFormer: Spherical Geometry Transformer for 360 Depth Estimation [52.23806040289676]
Panoramic distortion poses a significant challenge in 360 depth estimation.<n>We propose a spherical geometry transformer, named SGFormer, to address the above issues.<n>We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
arXiv Detail & Related papers (2024-04-23T12:36:24Z) - VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning
Decoupled Rotations on the Spherical Representations [55.25238503204253]
We propose a novel rotation estimation network, termed as VI-Net, to make the task easier.
To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution.
Experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.
arXiv Detail & Related papers (2023-08-19T05:47:53Z) - Rotation-Invariant Transformer for Point Cloud Matching [42.5714375149213]
We introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task.
We propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism.
RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall.
arXiv Detail & Related papers (2023-03-14T20:55:27Z) - Spherical Convolutional Neural Networks: Stability to Perturbations in
SO(3) [175.96910854433574]
Spherical convolutional neural networks (Spherical CNNs) learn nonlinear representations from 3D data by exploiting the data structure.
This paper investigates the properties that Spherical CNNs exhibit as they pertain to the rotational structure inherent in spherical signals.
arXiv Detail & Related papers (2020-10-12T17:16:07Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z) - Quaternion Equivariant Capsule Networks for 3D Point Clouds [58.566467950463306]
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations.
We connect dynamic routing between capsules to the well-known Weiszfeld algorithm.
Based on our operator, we build a capsule network that disentangles geometry from pose.
arXiv Detail & Related papers (2019-12-27T13:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.