MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report
- URL: http://arxiv.org/abs/2406.10125v1
- Date: Fri, 14 Jun 2024 15:31:45 GMT
- Title: MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report
- Authors: Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu,
- Abstract summary: We found that most existing algorithms construct Bird's Eye View features from multi-perspective images.
These algorithms perform poorly at the far end of roads and struggle when the primary subject in the image is occluded.
In this competition, we not only used multi-perspective images as input but also incorporated SD maps to address this issue.
- Score: 6.598847563245353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and use multi-task heads to delineate road centerlines, boundary lines, pedestrian crossings, and other areas. However, these algorithms perform poorly at the far end of roads and struggle when the primary subject in the image is occluded. Therefore, in this competition, we not only used multi-perspective images as input but also incorporated SD maps to address this issue. We employed map encoder pre-training to enhance the network's geometric encoding capabilities and utilized YOLOX to improve traffic element detection precision. Additionally, for area detection, we innovatively introduced LDTR and auxiliary tasks to achieve higher precision. As a result, our final OLUS score is 0.58.
Related papers
- DeepAerialMapper: Deep Learning-based Semi-automatic HD Map Creation for Highly Automated Vehicles [0.0]
We introduce a semi-automatic method for creating HD maps from high-resolution aerial imagery.
Our method involves training neural networks to semantically segment aerial images into classes relevant to HD maps.
Exporting the map to the Lanelet2 format allows easy extension for different use cases.
arXiv Detail & Related papers (2024-10-01T15:05:05Z) - Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping [18.97422977086127]
High-Definition Maps (HD maps) are essential for the precise navigation and decision-making of autonomous vehicles.
The online construction of HD maps using on-board sensors has emerged as a promising solution.
This paper proposes the PriorDrive framework to address these limitations by harnessing the power of prior maps.
arXiv Detail & Related papers (2024-09-09T06:17:46Z) - HD Maps are Lane Detection Generalizers: A Novel Generative Framework for Single-Source Domain Generalization [12.45819412968954]
We propose a novel generative framework using HD Maps for Single-Source Domain Generalization.
We validate that our framework outperforms the Domain Adaptation model MLDA with +3.01%p accuracy improvement.
arXiv Detail & Related papers (2023-11-28T08:15:27Z) - SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic
Understanding [57.108301842535894]
We introduce SNAP, a deep network that learns rich neural 2D maps from ground-level and overhead images.
We train our model to align neural maps estimated from different inputs, supervised only with camera poses over tens of millions of StreetView images.
SNAP can resolve the location of challenging image queries beyond the reach of traditional methods.
arXiv Detail & Related papers (2023-06-08T17:54:47Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic
Segmentation [43.12994451281451]
We present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras.
Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations.
arXiv Detail & Related papers (2022-06-27T13:37:50Z) - Scalable and Real-time Multi-Camera Vehicle Detection,
Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams.
Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - csBoundary: City-scale Road-boundary Detection in Aerial Images for
High-definition Maps [10.082536828708779]
We propose csBoundary to automatically detect road boundaries at the city scale for HD map annotation.
Our network takes as input an aerial image patch, and directly infers the continuous road-boundary graph from this image.
Our csBoundary is evaluated and compared on a public benchmark dataset.
arXiv Detail & Related papers (2021-11-11T02:04:36Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z) - DAGMapper: Learning to Map by Discovering Lane Topology [84.12949740822117]
We focus on drawing the lane boundaries of complex highways with many lanes that contain topology changes due to forks and merges.
We formulate the problem as inference in a directed acyclic graphical model (DAG), where the nodes of the graph encode geometric and topological properties of the local regions of the lane boundaries.
We show the effectiveness of our approach on two major North American Highways in two different states and show high precision and recall as well as 89% correct topology.
arXiv Detail & Related papers (2020-12-22T21:58:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.