YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object
Detection
- URL: http://arxiv.org/abs/2310.17379v1
- Date: Thu, 26 Oct 2023 13:16:27 GMT
- Title: YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object
Detection
- Authors: Chang Liu, Liguo Zhou, Yanliang Huang, Alois Knoll
- Abstract summary: YOLO-BEV is an efficient framework that harnesses a unique surrounding cameras setup to generate a 2D bird's-eye view of the vehicular environment.
Preliminary results validate the feasibility of YOLO-BEV in real-time vehicular perception tasks.
- Score: 8.082514573754954
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Vehicle perception systems strive to achieve comprehensive and rapid visual
interpretation of their surroundings for improved safety and navigation. We
introduce YOLO-BEV, an efficient framework that harnesses a unique surrounding
cameras setup to generate a 2D bird's-eye view of the vehicular environment. By
strategically positioning eight cameras, each at a 45-degree interval, our
system captures and integrates imagery into a coherent 3x3 grid format, leaving
the center blank, providing an enriched spatial representation that facilitates
efficient processing. In our approach, we employ YOLO's detection mechanism,
favoring its inherent advantages of swift response and compact model structure.
Instead of leveraging the conventional YOLO detection head, we augment it with
a custom-designed detection head, translating the panoramically captured data
into a unified bird's-eye view map of ego car. Preliminary results validate the
feasibility of YOLO-BEV in real-time vehicular perception tasks. With its
streamlined architecture and potential for rapid deployment due to minimized
parameters, YOLO-BEV poses as a promising tool that may reshape future
perspectives in autonomous driving systems.
Related papers
- YO-CSA-T: A Real-time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention [0.0]
YO-CSA is a real-time trajectory detection system for a 3D shuttlecock.
We map the 2D coordinate sequence extracted by YO-CSA into 3D space using stereo vision.
Our system achieves a high accuracy of 90.43% mAP@0.75, surpassing both YOLOv8s and YOLO11s.
arXiv Detail & Related papers (2025-01-11T08:00:25Z) - YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions [8.820126303110545]
This paper proposes two innovative deep learning models: YOLO-Vehicle and YOLO-Vehicle-Pro.
YOLO-Vehicle is an object detection model tailored specifically for autonomous driving scenarios.
YOLO-Vehicle-Pro builds upon this foundation by introducing an improved image dehazing algorithm.
arXiv Detail & Related papers (2024-10-23T10:07:13Z) - OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation [57.2213693781672]
Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems.
We propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance.
Our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation.
arXiv Detail & Related papers (2024-07-18T03:48:22Z) - RoadBEV: Road Surface Reconstruction in Bird's Eye View [55.0558717607946]
Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance.
Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction.
This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo.
arXiv Detail & Related papers (2024-04-09T20:24:29Z) - Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet.
BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge.
Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z) - Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures.
The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface.
More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z) - Understanding the Robustness of 3D Object Detection with Bird's-Eye-View
Representations in Autonomous Driving [31.98600806479808]
Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with camera inputs on popular benchmarks.
We evaluate the natural and adversarial robustness of various representative models under extensive settings.
We propose a 3D consistent patch attack by applying adversarial patches in thetemporal 3D space to guarantee the consistency.
arXiv Detail & Related papers (2023-03-30T11:16:58Z) - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [84.94140661523956]
We propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes.
We model each point in the 3D space by summing its projected features on the three planes.
Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels.
arXiv Detail & Related papers (2023-02-15T17:58:10Z) - CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
Transformers [36.838065731893735]
CoBEVT is the first generic multi-agent perception framework that can cooperatively generate BEV map predictions.
CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
arXiv Detail & Related papers (2022-07-05T17:59:28Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.