Related papers: REL-SF4PASS: Panoramic Semantic Segmentation with REL Depth Representation and Spherical Fusion

REL-SF4PASS: Panoramic Semantic Segmentation with REL Depth Representation and Spherical Fusion

URL: http://arxiv.org/abs/2601.16788v1
Date: Fri, 23 Jan 2026 14:33:49 GMT
Title: REL-SF4PASS: Panoramic Semantic Segmentation with REL Depth Representation and Spherical Fusion
Authors: Xuewei Li, Xinghan Bao, Zhimin Chen, Xi Li,
Abstract summary: REL-SF4PASS considerably improves performance and robustness on popular benchmark, Stanford2D3D Panoramic datasets.<n>It gains 2.35% average mIoU improvement on all 3 folds and reduces the performance variance by approximately 70% when facing 3D disturbance.
Score: 9.487755927754952
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As an important and challenging problem in computer vision, Panoramic Semantic Segmentation (PASS) aims to give complete scene perception based on an ultra-wide angle of view. Most PASS methods often focus on spherical geometry with RGB input or using the depth information in original or HHA format, which does not make full use of panoramic image geometry. To address these shortcomings, we propose REL-SF4PASS with our REL depth representation based on cylindrical coordinate and Spherical-dynamic Multi-Modal Fusion SMMF. REL is made up of Rectified Depth, Elevation-Gained Vertical Inclination Angle, and Lateral Orientation Angle, which fully represents 3D space in cylindrical coordinate style and the surface normal direction. SMMF aims to ensure the diversity of fusion for different panoramic image regions and reduce the breakage of cylinder side surface expansion in ERP projection, which uses different fusion strategies to match the different regions in panoramic images. Experimental results show that REL-SF4PASS considerably improves performance and robustness on popular benchmark, Stanford2D3D Panoramic datasets. It gains 2.35% average mIoU improvement on all 3 folds and reduces the performance variance by approximately 70% when facing 3D disturbance.

Related papers

SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs [21.891285551179365]
We introduce Spherical Coordinate-based Positional Embedding (SoPE)<n>Our method maps point-cloud token indices into a 3D spherical coordinate space, enabling unified modeling of spatial locations and directional angles.<n>This formulation preserves the inherent geometric structure of point-cloud data, enhances spatial awareness, and yields more consistent and expressive geometric representations for multimodal learning.
arXiv Detail & Related papers (2026-02-26T07:42:15Z)
PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion [61.6340987158734]
We present the first pinhole-fisheye framework for heterogeneous multi-view depth estimation, PFDepth.<n> PFDepth employs a unified architecture capable of processing arbitrary combinations of pinhole and fisheye cameras with varied intrinsics and extrinsics.<n>We show that PFDepth sets a state-of-the-art performance on KITTI-360 and RealHet datasets over current mainstream depth networks.
arXiv Detail & Related papers (2025-09-30T09:38:59Z)
MASH: Masked Anchored SpHerical Distances for 3D Shape Representation and Generation [55.88474970190769]
Masked Anchored SpHerical Distances (MASH) is a novel multi-view and parametrized representation of 3D shapes.<n>MASH is versatile for multiple applications including surface reconstruction, shape generation, completion, and blending.
arXiv Detail & Related papers (2025-04-12T09:28:12Z)
SphereFusion: Efficient Panorama Depth Estimation via Gated Fusion [21.97835451388508]
We present SphereFusion, an end-to-end framework that combines the strengths of various projection methods.<n>Specifically, SphereFusion employs 2D image convolution and mesh operations to extract two types of features from the panorama image in both equirectangular and spherical projection domains.<n>We show that SphereFusion achieves competitive results with other state-of-the-art methods, while presenting the fastest inference speed at only 17 ms on a 512$times$1024 panorama image.
arXiv Detail & Related papers (2025-02-09T11:36:45Z)
MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas [49.891712558113845]
We introduce Multi-Cylindrical Panoramic Depth Estimation (MCPDepth)<n>MCPDepth is a two-stage framework designed to enhance omnidirectional depth estimation.<n>Our method improves the mean absolute error (MAE) by 18.8% on the outdoor dataset Deep360 and by 19.9% on the real dataset 3D60.
arXiv Detail & Related papers (2024-08-03T03:35:37Z)
PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas [54.4948540627471]
We propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas. Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion. Results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods.
arXiv Detail & Related papers (2023-06-02T13:35:07Z)
Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS) We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry. Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z)
OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation. We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.