DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning
- URL: http://arxiv.org/abs/2404.06352v1
- Date: Tue, 9 Apr 2024 14:43:19 GMT
- Title: DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning
- Authors: Senthil Yogamani, David Unger, Venkatraman Narayanan, Varun Ravi Kumar,
- Abstract summary: There is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles.
We create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions.
We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras.
- Score: 7.012508171229966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic segmentation is an effective way to perform scene understanding. Recently, segmentation in 3D Bird's Eye View (BEV) space has become popular as its directly used by drive policy. However, there is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles. As this task has no real-world public dataset and existing synthetic datasets do not handle amodal regions due to occlusion, we create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions. We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras. We implement a baseline by applying cylindrical rectification on the fisheye images and using a standard LSS-based BEV segmentation model. We demonstrate that we can achieve better performance without undistortion, which has the adverse effects of increased runtime due to pre-processing, reduced field-of-view, and resampling artifacts. Further, we introduce a distortion-aware learnable BEV pooling strategy that is more effective for the fisheye cameras. We extend the model with an occlusion reasoning module, which is critical for estimating in BEV space. Qualitative performance of DaF-BEVSeg is showcased in the video at https://streamable.com/ge4v51.
Related papers
- View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene.
Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output.
We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z) - Dual-Camera Smooth Zoom on Mobile Phones [55.4114152554769]
We introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview.
The frame models (FI) technique is a potential solution but struggles with ground-truth collection.
We suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene.
arXiv Detail & Related papers (2024-04-07T10:28:01Z) - Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z) - Semi-Supervised Learning for Visual Bird's Eye View Semantic
Segmentation [16.3996408206659]
We present a novel semi-supervised framework for visual BEV semantic segmentation to boost performance by exploiting unlabeled images during the training.
A consistency loss that makes full use of unlabeled data is then proposed to constrain the model on not only semantic prediction but also the BEV feature.
Experiments on the nuScenes and Argoverse datasets show that our framework can effectively improve prediction accuracy.
arXiv Detail & Related papers (2023-08-28T12:23:36Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z) - F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera
Images for Automated Driving [3.286961611175469]
We introduce a baseline, F2BEV, to generate BEV height maps and BEV semantic segmentation maps from fisheye images.
F2BEV consists of a distortion-aware spatial cross attention module for querying and consolidating spatial information.
We evaluate single-task and multi-task variants of F2BEV on our synthetic FB-SSEM dataset.
arXiv Detail & Related papers (2023-03-07T04:58:57Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary
Camera Rigs [3.5728676902207988]
We present an effective transformer-based method for BEV semantic segmentation from arbitrary camera rigs.
Specifically, our method first encodes image features from arbitrary cameras with a shared backbone.
An efficient multi-camera deformable attention unit is designed to carry out the BEV-to-image view transformation.
arXiv Detail & Related papers (2022-03-08T12:39:51Z) - BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object
Detection for Autonomous Driving [2.9769485817170387]
CNNs can leverage the global context in the scene to project better.
We create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes.
We observe a significant improvement of 13% in mIoU using the simple baseline implementation.
arXiv Detail & Related papers (2021-07-11T01:11:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.