Unified Spherical Frontend: Learning Rotation-Equivariant Representations of Spherical Images from Any Camera
- URL: http://arxiv.org/abs/2511.18174v1
- Date: Sat, 22 Nov 2025 19:57:46 GMT
- Title: Unified Spherical Frontend: Learning Rotation-Equivariant Representations of Spherical Images from Any Camera
- Authors: Mukai Yu, Mosam Dabhi, Liuyue Xie, Sebastian Scherer, László A. Jeni,
- Abstract summary: Unified Spherical Frontend (USF) is a lens-agnostic framework that transforms images from any camera into a unit-sphere representation via ray-direction correspondences.<n>USF processes high-resolution spherical imagery efficiently and maintains less than 1% performance drop under random test-time rotations.
- Score: 12.448357304482668
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern perception increasingly relies on fisheye, panoramic, and other wide field-of-view (FoV) cameras, yet most pipelines still apply planar CNNs designed for pinhole imagery on 2D grids, where image-space neighborhoods misrepresent physical adjacency and models are sensitive to global rotations. Frequency-domain spherical CNNs partially address this mismatch but require costly spherical harmonic transforms that constrain resolution and efficiency. We introduce the Unified Spherical Frontend (USF), a lens-agnostic framework that transforms images from any calibrated camera into a unit-sphere representation via ray-direction correspondences, and performs spherical resampling, convolution, and pooling directly in the spatial domain. USF is modular: projection, location sampling, interpolation, and resolution control are fully decoupled. Its distance-only spherical kernels offer configurable rotation-equivariance (mirroring translation-equivariance in planar CNNs) while avoiding harmonic transforms entirely. We compare standard planar backbones with their spherical counterparts across classification, detection, and segmentation tasks on synthetic (Spherical MNIST) and real-world datasets (PANDORA, Stanford 2D-3D-S), and stress-test robustness to extreme lens distortions, varying FoV, and arbitrary rotations. USF processes high-resolution spherical imagery efficiently and maintains less than 1% performance drop under random test-time rotations, even without rotational augmentation, and even enables zero-shot generalization from one lens type to unseen wide-FoV lenses with minimal performance degradation.
Related papers
- FLIGHT: Fibonacci Lattice-based Inference for Geometric Heading in real-Time [7.517221623631364]
Estimating camera motion from monocular video is a fundamental problem in computer vision.<n>Existing methods that recover the camera's heading under known rotation tend to perform well in low-noise, low-outlier conditions.<n>We propose a novel generalization of the Hough transform on the unit sphere to estimate the camera's heading.
arXiv Detail & Related papers (2026-02-26T15:27:49Z) - HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis [59.25751939710903]
We propose a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos.<n>Our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately.<n>Results highlight a practical and scalable solution for EVS in real-world scenarios.
arXiv Detail & Related papers (2025-06-24T03:54:40Z) - AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion [0.5277756703318045]
We introduce a novel framework that addresses camera intrinsic and extrinsic parameters using a generic ray camera model.<n>Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions.<n>Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by 8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets.
arXiv Detail & Related papers (2025-03-27T14:59:59Z) - S-R2D2: a spherical extension of the R2D2 deep neural network series paradigm for wide-field radio-interferometric imaging [0.0]
Recently, the R2D2 paradigm, standing for ''Residual-to-Residual DNN series for high-Dynamic-range imaging'', was introduced for image formation in Radio Interferometry (RI)<n>We propose the spherical-imaging extension S-R2D2 to meet the spherical-imaging requirement of modern telescopes observing wide fields.
arXiv Detail & Related papers (2025-03-03T12:18:23Z) - 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions.
We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression.
Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z) - SGFormer: Spherical Geometry Transformer for 360 Depth Estimation [52.23806040289676]
Panoramic distortion poses a significant challenge in 360 depth estimation.<n>We propose a spherical geometry transformer, named SGFormer, to address the above issues.<n>We also present a query-based global conditional position embedding to compensate for spatial structure at varying resolutions.
arXiv Detail & Related papers (2024-04-23T12:36:24Z) - SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model [63.685132323224124]
Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.
In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges.
Experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.
arXiv Detail & Related papers (2024-03-15T06:26:46Z) - Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields [36.09829614806658]
We propose L2G-NeRF, a Local-to-Global registration method for Neural Radiance Fields.
Pixel-wise local alignment is learned in an unsupervised way via a deep network.
Our method outperforms the current state-of-the-art in terms of high-fidelity reconstruction and resolving large camera pose misalignment.
arXiv Detail & Related papers (2022-11-21T14:43:16Z) - OSLO: On-the-Sphere Learning for Omnidirectional images and its
application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images.
Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z) - Leveraging Spatial and Photometric Context for Calibrated Non-Lambertian
Photometric Stereo [61.6260594326246]
We introduce an efficient fully-convolutional architecture that can leverage both spatial and photometric context simultaneously.
Using separable 4D convolutions and 2D heat-maps reduces the size and makes more efficient.
arXiv Detail & Related papers (2021-03-22T18:06:58Z) - Scattering Networks on the Sphere for Scalable and Rotationally
Equivariant Spherical CNNs [2.453627017761322]
We develop scattering networks constructed on the sphere that provide a powerful representational space for spherical data.
By integrating scattering networks as an additional type of layer in the generalized spherical CNN framework, we show how they can be leveraged to scale spherical CNNs to the high resolution data typical of many practical applications.
arXiv Detail & Related papers (2021-02-04T19:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.