FishBEV: Distortion-Resilient Bird's Eye View Segmentation with Surround-View Fisheye Cameras
- URL: http://arxiv.org/abs/2509.13681v1
- Date: Wed, 17 Sep 2025 04:26:36 GMT
- Title: FishBEV: Distortion-Resilient Bird's Eye View Segmentation with Surround-View Fisheye Cameras
- Authors: Hang Li, Dianmo Sheng, Qiankun Dong, Zichun Wang, Zhiwei Xu, Tao Li,
- Abstract summary: We propose FishBEV, a novel BEV segmentation framework specifically tailored for fisheye cameras.<n>This framework introduces three complementary innovations, including a Distortion-Resilient Multi-scale Extraction (DRME) backbone that learns robust features under distortion while preserving scale consistency.<n>Experiments on the Synwoodscapes dataset demonstrate that FishBEV consistently outperforms SOTA baselines, regarding the performance evaluation of FishBEV on the surround-view fisheye BEV segmentation tasks.
- Score: 12.001699443894504
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As a cornerstone technique for autonomous driving, Bird's Eye View (BEV) segmentation has recently achieved remarkable progress with pinhole cameras. However, it is non-trivial to extend the existing methods to fisheye cameras with severe geometric distortion, ambiguous multi-view correspondences and unstable temporal dynamics, all of which significantly degrade BEV performance. To address these challenges, we propose FishBEV, a novel BEV segmentation framework specifically tailored for fisheye cameras. This framework introduces three complementary innovations, including a Distortion-Resilient Multi-scale Extraction (DRME) backbone that learns robust features under distortion while preserving scale consistency, an Uncertainty-aware Spatial Cross-Attention (U-SCA) mechanism that leverages uncertainty estimation for reliable cross-view alignment, a Distance-aware Temporal Self-Attention (D-TSA) module that adaptively balances near field details and far field context to ensure temporal coherence. Extensive experiments on the Synwoodscapes dataset demonstrate that FishBEV consistently outperforms SOTA baselines, regarding the performance evaluation of FishBEV on the surround-view fisheye BEV segmentation tasks.
Related papers
- DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation [72.89376712495464]
DAGE is a dual-stream transformer that disentangles global coherence from fine detail.<n>A low-resolution stream operates on aggressively downsampled frames with alternating frame/global attention to build a view-consistent representation.<n>A high-resolution stream processes the original images per-frame to preserve sharp boundaries and small structures.<n>This design scales resolution and clip length independently, supports inputs up to 2K, and maintains practical inference cost.
arXiv Detail & Related papers (2026-03-04T05:29:29Z) - FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception [0.8374040635931297]
We present a distortion-aware BEV segmentation framework that processes multi-camera high-resolution fisheye images.<n>Each image pixel is lifted into 3D space via Gaussian parameterization, predicting spatial means and anisotropic covariances to explicitly model geometric uncertainty.<n>Experiments demonstrate strong segmentation performance on complex parking and urban driving scenarios, achieving IoU scores of 87.75% for drivable regions and 57.26% for vehicles under severe fisheye distortion.
arXiv Detail & Related papers (2025-11-21T12:42:07Z) - RESAR-BEV: An Explainable Progressive Residual Autoregressive Approach for Camera-Radar Fusion in BEV Segmentation [4.043972974168962]
Bird's-Eye-View (BEV) semantic segmentation provides comprehensive environmental perception for autonomous driving.<n>We propose RESAR-BEV, a progressive refinement framework that advances beyond single-step end-to-end approaches.<n> Experiments on nuScenes demonstrate RESAR-BEV state-of-the-art performance with 54.0% mIoU across 7 essential driving-scene categories.
arXiv Detail & Related papers (2025-05-10T05:10:07Z) - LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation [10.434754671492723]
We propose LSSInst, a two-stage object detector incorporating BEV and instance representations in tandem.
The proposed detector exploits fine-grained pixel-level features that can be flexibly integrated into existing LSS-based BEV networks.
Our proposed framework is of excellent generalization ability and performance, which boosts the performances of modern LSS-based BEV perception methods without bells and whistles.
arXiv Detail & Related papers (2024-11-09T13:03:54Z) - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.<n>We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.<n>Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z) - DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning [7.012508171229966]
There is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles.
We create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions.
We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras.
arXiv Detail & Related papers (2024-04-09T14:43:19Z) - CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow [20.550935390111686]
We introduce CLIP-BEVFormer, a novel approach to enhance the multi-view image-derived BEV backbones with ground truth information flow.
We conduct extensive experiments on the challenging nuScenes dataset and showcase significant and consistent improvements over the SOTA.
arXiv Detail & Related papers (2024-03-13T19:21:03Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [53.5449912019877]
We present the Long-term Effective Any Point Tracking (LEAP) module.<n>LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.<n>Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection [47.7933708173225]
Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection.
This paper introduces a "modernized" dense BEV framework dubbed BEVNeXt.
On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks.
arXiv Detail & Related papers (2023-12-04T07:35:02Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.