Camera Perspective Transformation to Bird's Eye View via Spatial Transformer Model for Road Intersection Monitoring
- URL: http://arxiv.org/abs/2408.05577v2
- Date: Wed, 14 Aug 2024 02:20:50 GMT
- Title: Camera Perspective Transformation to Bird's Eye View via Spatial Transformer Model for Road Intersection Monitoring
- Authors: Rukesh Prajapati, Amr S. El-Wakeel,
- Abstract summary: Road intersection monitoring and control research often utilize bird's eye view (BEV) simulators.
In real traffic settings, achieving a BEV akin to that in a simulator requires the deployment of drones or specific sensor mounting.
We introduce a novel deep-learning model that converts a single camera's perspective of a road intersection into a BEV.
- Score: 0.09208007322096533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Road intersection monitoring and control research often utilize bird's eye view (BEV) simulators. In real traffic settings, achieving a BEV akin to that in a simulator necessitates the deployment of drones or specific sensor mounting, which is neither feasible nor practical. Consequently, traffic intersection management remains confined to simulation environments given these constraints. In this paper, we address the gap between simulated environments and real-world implementation by introducing a novel deep-learning model that converts a single camera's perspective of a road intersection into a BEV. We created a simulation environment that closely resembles a real-world traffic junction. The proposed model transforms the vehicles into BEV images, facilitating road intersection monitoring and control model processing. Inspired by image transformation techniques, we propose a Spatial-Transformer Double Decoder-UNet (SDD-UNet) model that aims to eliminate the transformed image distortions. In addition, the model accurately estimates the vehicle's positions and enables the direct application of simulation-trained models in real-world contexts. SDD-UNet model achieves an average dice similarity coefficient (DSC) above 95% which is 40% better than the original UNet model. The mean absolute error (MAE) is 0.102 and the centroid of the predicted mask is 0.14 meters displaced, on average, indicating high accuracy.
Related papers
- Data-Driven Traffic Simulation for an Intersection in a Metropolis [7.264786765085108]
We present a novel data-driven simulation environment for modeling traffic in street intersections.
We train trajectory forecasting models to learn agent interactions and environmental constraints.
The simulation can run either autonomously, or under explicit human control conditioned on the generative distributions.
arXiv Detail & Related papers (2024-08-01T22:25:06Z) - XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis [84.23233209017192]
This paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations.
The dataset is unique as it includes testing images captured by deviating from the training trajectory by 1-4 meters.
We establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multi-camera settings.
arXiv Detail & Related papers (2024-06-26T14:00:21Z) - On Transferability of Driver Observation Models from Simulated to Real
Environments in Autonomous Cars [23.514129229090987]
This paper investigates the viability of transferring video-based driver observation models from simulation to real-world scenarios in autonomous vehicles.
We record a dataset featuring actual autonomous driving conditions and involving seven participants engaged in highly distracting secondary activities.
Our dataset was designed in accordance with an existing large-scale simulator dataset used as the training source.
arXiv Detail & Related papers (2023-07-31T10:18:49Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse
Data using a Learning-based Unscented Kalman Filter [65.93205328894608]
We learn the residual errors between a dynamic and/or simulator model and the real robot.
We show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware.
arXiv Detail & Related papers (2022-09-07T15:15:12Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Imagining The Road Ahead: Multi-Agent Trajectory Prediction via
Differentiable Simulation [17.953880589741438]
We develop a deep generative model built on a fully differentiable simulator for trajectory prediction.
We achieve state-of-the-art results on the INTERACTION dataset, using standard neural architectures and a standard variational training objective.
We name our model ITRA, for "Imagining the Road Ahead"
arXiv Detail & Related papers (2021-04-22T17:48:08Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z) - A Sim2Real Deep Learning Approach for the Transformation of Images from
Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's
Eye View [0.0]
Distances can be more easily estimated when the camera perspective is transformed to a bird's eye view (BEV)
This paper describes a methodology to obtain a corrected 360deg BEV image given images from multiple vehicle-mounted cameras.
The neural network approach does not rely on manually labeled data, but is trained on a synthetic dataset in such a way that it generalizes well to real-world data.
arXiv Detail & Related papers (2020-05-08T14:54:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.