Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles
- URL: http://arxiv.org/abs/2505.06113v1
- Date: Fri, 09 May 2025 15:13:04 GMT
- Title: Camera-Only Bird's Eye View Perception: A Neural Approach to LiDAR-Free Environmental Mapping for Autonomous Vehicles
- Authors: Anupkumar Bochare,
- Abstract summary: We propose a camera-only perception framework that produces Bird's Eye View (BEV) maps by extending the Lift-Splat-Shoot architecture.<n>Our method combines YOLOv11-based object detection with DepthAnythingV2 monocular depth estimation across multi-camera inputs to achieve comprehensive 360-degree scene understanding.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous vehicle perception systems have traditionally relied on costly LiDAR sensors to generate precise environmental representations. In this paper, we propose a camera-only perception framework that produces Bird's Eye View (BEV) maps by extending the Lift-Splat-Shoot architecture. Our method combines YOLOv11-based object detection with DepthAnythingV2 monocular depth estimation across multi-camera inputs to achieve comprehensive 360-degree scene understanding. We evaluate our approach on the OpenLane-V2 and NuScenes datasets, achieving up to 85% road segmentation accuracy and 85-90% vehicle detection rates when compared against LiDAR ground truth, with average positional errors limited to 1.2 meters. These results highlight the potential of deep learning to extract rich spatial information using only camera inputs, enabling cost-efficient autonomous navigation without sacrificing accuracy.
Related papers
- NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z) - Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras [5.690128924544198]
We present a method for distance estimation using a monocular event camera and a roadside LED bar.<n>The proposed approach achieves over 90% success rate with less than 0.5-meter error for distances ranging from 20 to 60 meters.<n>Future work includes extending this method to full position estimation by leveraging infrastructure such as smart poles equipped with LEDs.
arXiv Detail & Related papers (2025-05-23T07:44:33Z) - RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.<n>RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z) - Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception [20.875243604623723]
We propose a fine-tuning method for BEV perception network based on visual 2D semantic perception.
Considering the maturity and development of 2D perception technologies, our method significantly reduces the dependency on high-cost LiDAR ground truths.
arXiv Detail & Related papers (2024-09-09T17:40:30Z) - RoadRunner -- Learning Traversability Estimation for Autonomous Off-road Driving [13.101416329887755]
We present RoadRunner, a framework capable of predicting terrain traversability and an elevation map directly from camera and LiDAR sensor inputs.
RoadRunner enables reliable autonomous navigation, by fusing sensory information, handling of uncertainty, and generation of contextually informed predictions.
We demonstrate the effectiveness of RoadRunner in enabling safe and reliable off-road navigation at high speeds in multiple real-world driving scenarios through unstructured desert environments.
arXiv Detail & Related papers (2024-02-29T16:47:54Z) - SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - Real-time Full-stack Traffic Scene Perception for Autonomous Driving
with Roadside Cameras [20.527834125706526]
We propose a novel framework for traffic scene perception with roadside cameras.
The proposed framework covers a full-stack of roadside perception, including object detection, object localization, object tracking, and multi-camera information fusion.
Our framework is deployed at a two-lane roundabout located at Ellsworth Rd. and State St., Ann Arbor, MI, USA, providing 7x24 real-time traffic flow monitoring and high-precision vehicle trajectory extraction.
arXiv Detail & Related papers (2022-06-20T13:33:52Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Event Guided Depth Sensing [50.997474285910734]
We present an efficient bio-inspired event-camera-driven depth estimation algorithm.
In our approach, we illuminate areas of interest densely, depending on the scene activity detected by the event camera.
We show the feasibility of our approach in a simulated autonomous driving sequences and real indoor environments.
arXiv Detail & Related papers (2021-10-20T11:41:11Z) - R4Dyn: Exploring Radar for Self-Supervised Monocular Depth Estimation of
Dynamic Scenes [69.6715406227469]
Self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches.
We present R4Dyn, a novel set of techniques to use cost-efficient radar data on top of a self-supervised depth estimation framework.
arXiv Detail & Related papers (2021-08-10T17:57:03Z) - Ego-motion and Surrounding Vehicle State Estimation Using a Monocular
Camera [11.29865843123467]
We propose a novel machine learning method to estimate ego-motion and surrounding vehicle state using a single monocular camera.
Our approach is based on a combination of three deep neural networks to estimate the 3D vehicle bounding box, depth, and optical flow from a sequence of images.
arXiv Detail & Related papers (2020-05-04T16:41:38Z) - Depth Sensing Beyond LiDAR Range [84.19507822574568]
We propose a novel three-camera system that utilizes small field of view cameras.
Our system, along with our novel algorithm for computing metric depth, does not require full pre-calibration.
It can output dense depth maps with practically acceptable accuracy for scenes and objects at long distances.
arXiv Detail & Related papers (2020-04-07T00:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.