Related papers: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

URL: http://arxiv.org/abs/2005.04078v1
Date: Fri, 8 May 2020 14:54:13 GMT
Title: A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View
Authors: Lennart Reiher, Bastian Lampe, Lutz Eckstein
Abstract summary: Distances can be more easily estimated when the camera perspective is transformed to a bird's eye view (BEV) This paper describes a methodology to obtain a corrected 360deg BEV image given images from multiple vehicle-mounted cameras. The neural network approach does not rely on manually labeled data, but is trained on a synthetic dataset in such a way that it generalizes well to real-world data.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate environment perception is essential for automated driving. When using monocular cameras, the distance estimation of elements in the environment poses a major challenge. Distances can be more easily estimated when the camera perspective is transformed to a bird's eye view (BEV). For flat surfaces, Inverse Perspective Mapping (IPM) can accurately transform images to a BEV. Three-dimensional objects such as vehicles and vulnerable road users are distorted by this transformation making it difficult to estimate their position relative to the sensor. This paper describes a methodology to obtain a corrected 360{\deg} BEV image given images from multiple vehicle-mounted cameras. The corrected BEV image is segmented into semantic classes and includes a prediction of occluded areas. The neural network approach does not rely on manually labeled data, but is trained on a synthetic dataset in such a way that it generalizes well to real-world data. By using semantically segmented images as input, we reduce the reality gap between simulated and real-world data and are able to show that our method can be successfully applied in the real world. Extensive experiments conducted on the synthetic data demonstrate the superiority of our approach compared to IPM. Source code and datasets are available at https://github.com/ika-rwth-aachen/Cam2BEV

Related papers

Towards Generalizable Multi-Camera 3D Object Detection via Perspective Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches. We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z)
Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures. The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface. More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z)
Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint. Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance. We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z)
The Change You Want to See (Now in 3D) [65.61789642291636]
The goal of this paper is to detect what has changed, if anything, between two "in the wild" images of the same 3D scene. We contribute a change detection model that is trained entirely on synthetic data and is class-agnostic. We release a new evaluation dataset consisting of real-world image pairs with human-annotated differences.
arXiv Detail & Related papers (2023-08-21T01:59:45Z)
BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images [13.258689143949912]
We propose an end-to-end visual semantic localization neural network using multi-view camera images. The BEV-Locator is capable to estimate the vehicle poses under versatile scenarios. Experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$circ$ in lateral, longitudinal translation and heading angle degree.
arXiv Detail & Related papers (2022-11-27T20:24:56Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process. The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z)
BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud [21.29622194272066]
We focus on bird's eye semantic segmentation, a task that predicts pixel-wise semantic segmentation in BEV from side RGB images. There are two main challenges to this task: the view transformation from side view to bird's eye view, as well as transfer learning to unseen domains. Our novel 2-staged perception pipeline explicitly predicts pixel depths and combines them with pixel semantics in an efficient manner.
arXiv Detail & Related papers (2020-06-19T23:30:11Z)
SimAug: Learning Robust Representations from Simulation for Trajectory Prediction [78.91518036949918]
We propose a novel approach to learn robust representation through augmenting the simulation training data. We show that SimAug achieves promising results on three real-world benchmarks using zero real training data.
arXiv Detail & Related papers (2020-04-04T21:22:01Z)
Semantic sensor fusion: from camera to sparse lidar information [7.489722641968593]
This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps.
arXiv Detail & Related papers (2020-03-04T03:09:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.