An Efficient Transformer for Simultaneous Learning of BEV and Lane
Representations in 3D Lane Detection
- URL: http://arxiv.org/abs/2306.04927v1
- Date: Thu, 8 Jun 2023 04:18:31 GMT
- Title: An Efficient Transformer for Simultaneous Learning of BEV and Lane
Representations in 3D Lane Detection
- Authors: Ziye Chen, Kate Smith-Miles, Bo Du, Guoqi Qian, Mingming Gong
- Abstract summary: We propose an efficient transformer for 3D lane detection.
Different from the vanilla transformer, our model contains a cross-attention mechanism to simultaneously learn lane and BEV representations.
Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively.
- Score: 55.281369497158515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurately detecting lane lines in 3D space is crucial for autonomous
driving. Existing methods usually first transform image-view features into
bird-eye-view (BEV) by aid of inverse perspective mapping (IPM), and then
detect lane lines based on the BEV features. However, IPM ignores the changes
in road height, leading to inaccurate view transformations. Additionally, the
two separate stages of the process can cause cumulative errors and increased
complexity. To address these limitations, we propose an efficient transformer
for 3D lane detection. Different from the vanilla transformer, our model
contains a decomposed cross-attention mechanism to simultaneously learn lane
and BEV representations. The mechanism decomposes the cross-attention between
image-view and BEV features into the one between image-view and lane features,
and the one between lane and BEV features, both of which are supervised with
ground-truth lane lines. Our method obtains 2D and 3D lane predictions by
applying the lane features to the image-view and BEV features, respectively.
This allows for a more accurate view transformation than IPM-based methods, as
the view transformation is learned from data with a supervised cross-attention.
Additionally, the cross-attention between lane and BEV features enables them to
adjust to each other, resulting in more accurate lane detection than the two
separate stages. Finally, the decomposed cross-attention is more efficient than
the original one. Experimental results on OpenLane and ONCE-3DLanes demonstrate
the state-of-the-art performance of our method.
Related papers
- DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation [40.71071200694655]
We present DV-3DLane, a novel end-to-end Dual-View multi-modal 3D Lane detection framework.
It synergizes the strengths of both images and LiDAR points.
It achieves state-of-the-art performance, with a remarkable 11.2 gain in F1 score and a substantial 53.5% reduction in errors.
arXiv Detail & Related papers (2024-06-23T10:48:42Z) - CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal
Curve Queries and Attention [6.337799395191661]
We present CurveFormer++, a single-stage Transformer-based method that does not require the image feature view transform module.
By employing a Transformer decoder, the model can iteratively refine the 3D lane detection results.
We evaluate our approach for the 3D lane detection task on two publicly available real-world datasets.
arXiv Detail & Related papers (2024-02-09T14:13:40Z) - Decoupling the Curve Modeling and Pavement Regression for Lane Detection [67.22629246312283]
curve-based lane representation is a popular approach in many lane detection methods.
We propose a new approach to the lane detection task by decomposing it into two parts: curve modeling and ground height regression.
arXiv Detail & Related papers (2023-09-19T11:24:14Z) - Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures.
The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface.
More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Online Lane Graph Extraction from Onboard Video [133.68032636906133]
We use the video stream from an onboard camera for online extraction of the surrounding's lane graph.
Using video, instead of a single image, as input poses both benefits and challenges in terms of combining the information from different timesteps.
A single model of this proposed simple, yet effective, method can process any number of images, including one, to produce accurate lane graphs.
arXiv Detail & Related papers (2023-04-03T12:36:39Z) - Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane
Detection [35.797350813519756]
Monocular 3D lane detection is a challenging task due to its lack of depth information.
We propose a BEV-free method named Anchor3DLane to predict 3D lanes directly from FV representations.
arXiv Detail & Related papers (2023-01-06T04:35:31Z) - CurveFormer: 3D Lane Detection by Curve Propagation with Curve Queries
and Attention [3.330270927081078]
3D lane detection is an integral part of autonomous driving systems.
Previous CNN and Transformer-based methods usually first generate a bird's-eye-view (BEV) feature map from the front view image.
We propose CurveFormer, a single-stage Transformer-based method that directly calculates 3D lane parameters.
arXiv Detail & Related papers (2022-09-16T14:54:57Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - PersFormer: 3D Lane Detection via Perspective Transformer and the
OpenLane Benchmark [109.03773439461615]
PersFormer is an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module.
We release one of the first large-scale real-world 3D lane datasets, called OpenLane, with high-quality annotation and scenario diversity.
arXiv Detail & Related papers (2022-03-21T16:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.