LATR: 3D Lane Detection from Monocular Images with Transformer
- URL: http://arxiv.org/abs/2308.04583v2
- Date: Sun, 20 Aug 2023 13:31:54 GMT
- Title: LATR: 3D Lane Detection from Monocular Images with Transformer
- Authors: Yueru Luo, Chaoda Zheng, Xu Yan, Tang Kun, Chao Zheng, Shuguang Cui,
Zhen Li
- Abstract summary: 3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving.
Recent advances rely on structural 3D surrogates built from front-view image features and camera parameters.
We present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation.
- Score: 42.34193673590758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D lane detection from monocular images is a fundamental yet challenging task
in autonomous driving. Recent advances primarily rely on structural 3D
surrogates (e.g., bird's eye view) built from front-view image features and
camera parameters. However, the depth ambiguity in monocular images inevitably
causes misalignment between the constructed surrogate feature map and the
original image, posing a great challenge for accurate lane detection. To
address the above issue, we present a novel LATR model, an end-to-end 3D lane
detector that uses 3D-aware front-view features without transformed view
representation. Specifically, LATR detects 3D lanes via cross-attention based
on query and key-value pairs, constructed using our lane-aware query generator
and dynamic 3D ground positional embedding. On the one hand, each query is
generated based on 2D lane-aware features and adopts a hybrid embedding to
enhance lane information. On the other hand, 3D space information is injected
as positional embedding from an iteratively-updated 3D ground plane. LATR
outperforms previous state-of-the-art methods on both synthetic Apollo,
realistic OpenLane and ONCE-3DLanes by large margins (e.g., 11.4 gain in terms
of F1 score on OpenLane). Code will be released at
https://github.com/JMoonr/LATR .
Related papers
- Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors [40.92232275558338]
3D lane detection and topology reasoning are essential tasks in autonomous driving scenarios.
We propose Topo2D, a novel framework based on Transformer, leveraging 2D lane instances to initialize 3D queries and 3D positional embeddings.
Topo2D achieves 44.5% OLS on multi-view topology reasoning benchmark OpenLane-V2 and 62.6% F-Socre on single-view 3D lane detection benchmark OpenLane.
arXiv Detail & Related papers (2024-06-05T09:48:56Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware
Embeddings [29.050983641961658]
We introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE.
Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras.
arXiv Detail & Related papers (2023-09-30T14:52:26Z) - An Efficient Transformer for Simultaneous Learning of BEV and Lane
Representations in 3D Lane Detection [55.281369497158515]
We propose an efficient transformer for 3D lane detection.
Different from the vanilla transformer, our model contains a cross-attention mechanism to simultaneously learn lane and BEV representations.
Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively.
arXiv Detail & Related papers (2023-06-08T04:18:31Z) - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection [100.02565745233247]
Current query-based methods rely on global 3D position embeddings to learn the geometric correspondence between images and 3D space.
We propose a novel method based on CAmera view Position Embedding, called CAPE.
CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
arXiv Detail & Related papers (2023-03-17T18:59:54Z) - Reconstruct from Top View: A 3D Lane Detection Approach based on
Geometry Structure Prior [19.1954119672487]
We propose an advanced approach in targeting the problem of monocular 3D lane detection by leveraging geometry structure underneath process of 2D to 3D lane reconstruction.
We first analyze the geometry between the 3D lane and its 2D representation on the ground and propose to impose explicit supervision based on the structure prior.
Second, to reduce the structure loss in 2D lane representation, we directly extract top view lane information from front view images.
arXiv Detail & Related papers (2022-06-21T04:03:03Z) - ONCE-3DLanes: Building Monocular 3D Lane Detection [41.46466150783367]
We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space.
By exploiting the explicit relationship between point clouds and image pixels, a dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations.
arXiv Detail & Related papers (2022-04-30T16:35:25Z) - PersFormer: 3D Lane Detection via Perspective Transformer and the
OpenLane Benchmark [109.03773439461615]
PersFormer is an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module.
We release one of the first large-scale real-world 3D lane datasets, called OpenLane, with high-quality annotation and scenario diversity.
arXiv Detail & Related papers (2022-03-21T16:12:53Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.