FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception
- URL: http://arxiv.org/abs/2511.17210v1
- Date: Fri, 21 Nov 2025 12:42:07 GMT
- Title: FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception
- Authors: Shubham Sonarghare, Prasad Deshpande, Ciaran Hogan, Deepika-Rani Kaliappan-Mahalingam, Ganesh Sistu,
- Abstract summary: We present a distortion-aware BEV segmentation framework that processes multi-camera high-resolution fisheye images.<n>Each image pixel is lifted into 3D space via Gaussian parameterization, predicting spatial means and anisotropic covariances to explicitly model geometric uncertainty.<n>Experiments demonstrate strong segmentation performance on complex parking and urban driving scenarios, achieving IoU scores of 87.75% for drivable regions and 57.26% for vehicles under severe fisheye distortion.
- Score: 0.8374040635931297
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurate BEV semantic segmentation from fisheye imagery remains challenging due to extreme non-linear distortion, occlusion, and depth ambiguity inherent to wide-angle projections. We present a distortion-aware BEV segmentation framework that directly processes multi-camera high-resolution fisheye images,utilizing calibrated geometric unprojection and per-pixel depth distribution estimation. Each image pixel is lifted into 3D space via Gaussian parameterization, predicting spatial means and anisotropic covariances to explicitly model geometric uncertainty. The projected 3D Gaussians are fused into a BEV representation via differentiable splatting, producing continuous, uncertainty-aware semantic maps without requiring undistortion or perspective rectification. Extensive experiments demonstrate strong segmentation performance on complex parking and urban driving scenarios, achieving IoU scores of 87.75% for drivable regions and 57.26% for vehicles under severe fisheye distortion and diverse environmental conditions.
Related papers
- CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation [0.9558392439655014]
Self-supervised surround-view depth estimation enables dense, low-cost 3D perception with a 360 field of view from multiple minimally overlapping images.<n>Yet, most existing methods suffer from depth estimates that are inconsistent between overlapping images.<n>We propose a novel geometry-guided method for calibrated, time-synchronized multi-camera rigs that predicts dense, metric, and cross-view-consistent depth.
arXiv Detail & Related papers (2025-11-20T14:55:28Z) - PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion [61.6340987158734]
We present the first pinhole-fisheye framework for heterogeneous multi-view depth estimation, PFDepth.<n> PFDepth employs a unified architecture capable of processing arbitrary combinations of pinhole and fisheye cameras with varied intrinsics and extrinsics.<n>We show that PFDepth sets a state-of-the-art performance on KITTI-360 and RealHet datasets over current mainstream depth networks.
arXiv Detail & Related papers (2025-09-30T09:38:59Z) - PIS3R: Very Large Parallax Image Stitching via Deep 3D Reconstruction [5.816094524098354]
Image stitching aim to align two images taken from different viewpoints into one seamless, wider image.<n>Most existing stitching methods struggle to handle such images with large parallax effectively.<n>We propose PIS3R that is robust to very large parallax based on the novel concept of deep 3D reconstruction.
arXiv Detail & Related papers (2025-08-06T09:18:45Z) - AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion [0.5277756703318045]
We introduce a novel framework that addresses camera intrinsic and extrinsic parameters using a generic ray camera model.<n>Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions.<n>Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by 8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets.
arXiv Detail & Related papers (2025-03-27T14:59:59Z) - HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection [34.72603963887331]
The application of vision-based multi-view environmental perception system has been increasingly recognized in autonomous driving technology.<n>Current state-of-the-art solutions primarily encode image features from each camera view into the BEV space through explicit or implicit depth prediction.<n>We propose a novel approach that decouples feature sampling in the textbfBEV grid queries paradigm into textbfHorizontal feature aggregation.
arXiv Detail & Related papers (2024-12-25T11:49:14Z) - RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation [88.54817424560056]
We propose a distortion vector map (DVM) that measures the degree and direction of local distortion.
By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns.
In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel.
In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification.
arXiv Detail & Related papers (2024-06-27T06:38:56Z) - SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z) - Wide-angle Image Rectification: A Survey [86.36118799330802]
wide-angle images contain distortions that violate the assumptions underlying pinhole camera models.
Image rectification, which aims to correct these distortions, can solve these problems.
We present a detailed description and discussion of the camera models used in different approaches.
Next, we review both traditional geometry-based image rectification methods and deep learning-based methods.
arXiv Detail & Related papers (2020-10-30T17:28:40Z) - Fisheye Distortion Rectification from Deep Straight Lines [34.61402494687801]
We present a novel line-aware rectification network (LaRecNet) to address the problem of fisheye distortion rectification.
Our model achieves state-of-the-art performance in terms of both geometric accuracy and image quality.
In particular, the images rectified by LaRecNet achieve the highest peak signal-to-noise ratio (PSNR) and structure similarity index (SSIM) compared with the groundtruth.
arXiv Detail & Related papers (2020-03-25T13:20:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.