Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic Vision
- URL: http://arxiv.org/abs/2512.00911v1
- Date: Sun, 30 Nov 2025 14:28:21 GMT
- Title: Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic Vision
- Authors: Yuhao Shan, Qianyi Yuan, Jingguo Liu, Shigang Li, Jianfeng Li, Tong Chen,
- Abstract summary: This study presents a dual-stream angle-aware generation network that jointly estimates camera inclination angles and reconstructs upright panoramic images.<n> Experiments on the SUN360 and M3D datasets demonstrate that our method outperforms existing approaches in both inclination estimation and upright panorama generation.
- Score: 9.05196155518077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoramic cameras, capable of capturing a 360-degree field of view, are crucial in robotic vision, particularly in environments with sparse features. However, non-upright panoramas due to unstable robot postures hinder downstream tasks. Traditional IMU-based correction methods suffer from drift and external disturbances, while vision-based approaches offer a promising alternative. This study presents a dual-stream angle-aware generation network that jointly estimates camera inclination angles and reconstructs upright panoramic images. The network comprises a CNN branch that extracts local geometric structures from equirectangular projections and a ViT branch that captures global contextual cues from cubemap projections. These are integrated through a dual-projection adaptive fusion module that aligns spatial features across both domains. To further enhance performance, we introduce a high-frequency enhancement block, circular padding, and channel attention mechanisms to preserve 360° continuity and improve geometric sensitivity. Experiments on the SUN360 and M3D datasets demonstrate that our method outperforms existing approaches in both inclination estimation and upright panorama generation. Ablation studies further validate the contribution of each module and highlight the synergy between the two tasks. The code and related datasets can be found at: https://github.com/YuhaoShine/DualProjectionFusion.
Related papers
- DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training [76.82789568988557]
DiT360 is a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation.<n>Our method achieves better boundary consistency and image fidelity across eleven quantitative metrics.
arXiv Detail & Related papers (2025-10-13T17:59:15Z) - PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion [87.13016347332943]
PanoWorld-X is a novel framework for high-fidelity and controllable panoramic video generation with diverse camera trajectories.<n>Our experiments demonstrate superior performance in various aspects, including motion range, control precision, and visual quality.
arXiv Detail & Related papers (2025-09-29T16:22:00Z) - RangeSAM: Leveraging Visual Foundation Models for Range-View repesented LiDAR segmentation [6.513648249086729]
We present the first range-view framework that adapts SAM2 to 3D segmentation, coupling efficient 2D feature extraction with standard projection/back-projection to operate on point clouds.<n>Our approach achieves competitive performance on Semantic KITTI while benefiting from the speed, scalability, and deployment simplicity of 2D-centric pipelines.
arXiv Detail & Related papers (2025-09-19T11:33:10Z) - CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction [25.468907201804093]
Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content.<n>However, 2D diffusion models often struggle to produce dense images with strong multi-view consistency.<n>We present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view.
arXiv Detail & Related papers (2025-03-11T03:08:43Z) - SphereFusion: Efficient Panorama Depth Estimation via Gated Fusion [21.97835451388508]
We present SphereFusion, an end-to-end framework that combines the strengths of various projection methods.<n>Specifically, SphereFusion employs 2D image convolution and mesh operations to extract two types of features from the panorama image in both equirectangular and spherical projection domains.<n>We show that SphereFusion achieves competitive results with other state-of-the-art methods, while presenting the fastest inference speed at only 17 ms on a 512$times$1024 panorama image.
arXiv Detail & Related papers (2025-02-09T11:36:45Z) - MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas [49.891712558113845]
We introduce Multi-Cylindrical Panoramic Depth Estimation (MCPDepth)<n>MCPDepth is a two-stage framework designed to enhance omnidirectional depth estimation.<n>Our method improves the mean absolute error (MAE) by 18.8% on the outdoor dataset Deep360 and by 19.9% on the real dataset 3D60.
arXiv Detail & Related papers (2024-08-03T03:35:37Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Multi-Projection Fusion and Refinement Network for Salient Object
Detection in 360{\deg} Omnidirectional Image [141.10227079090419]
We propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360deg omnidirectional image.
MPFR-Net uses the equirectangular projection image and four corresponding cube-unfolding images as inputs.
Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-23T14:50:40Z) - UniFuse: Unidirectional Fusion for 360$^{\circ}$ Panorama Depth
Estimation [11.680475784102308]
This paper introduces a new framework to fuse features from the two projections, unidirectionally feeding the cubemap features to the equirectangular features only at the decoding stage.
Experiments verify the effectiveness of our proposed fusion strategy and module, and our model achieves state-of-the-art performance on four popular datasets.
arXiv Detail & Related papers (2021-02-06T10:01:09Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.