Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane
Detection
- URL: http://arxiv.org/abs/2301.02371v2
- Date: Tue, 28 Mar 2023 04:28:30 GMT
- Title: Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane
Detection
- Authors: Shaofei Huang, Zhenwei Shen, Zehao Huang, Zi-han Ding, Jiao Dai,
Jizhong Han, Naiyan Wang, Si Liu
- Abstract summary: Monocular 3D lane detection is a challenging task due to its lack of depth information.
We propose a BEV-free method named Anchor3DLane to predict 3D lanes directly from FV representations.
- Score: 35.797350813519756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D lane detection is a challenging task due to its lack of depth
information. A popular solution is to first transform the front-viewed (FV)
images or features into the bird-eye-view (BEV) space with inverse perspective
mapping (IPM) and detect lanes from BEV features. However, the reliance of IPM
on flat ground assumption and loss of context information make it inaccurate to
restore 3D information from BEV representations. An attempt has been made to
get rid of BEV and predict 3D lanes from FV representations directly, while it
still underperforms other BEV-based methods given its lack of structured
representation for 3D lanes. In this paper, we define 3D lane anchors in the 3D
space and propose a BEV-free method named Anchor3DLane to predict 3D lanes
directly from FV representations. 3D lane anchors are projected to the FV
features to extract their features which contain both good structural and
context information to make accurate predictions. In addition, we also develop
a global optimization method that makes use of the equal-width property between
lanes to reduce the lateral error of predictions. Extensive experiments on
three popular 3D lane detection benchmarks show that our Anchor3DLane
outperforms previous BEV-based methods and achieves state-of-the-art
performances. The code is available at:
https://github.com/tusen-ai/Anchor3DLane.
Related papers
- BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection [47.74067616658986]
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain.
Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread.
BeVSpread can significantly improve the performance of existing frustum-based BEV methods by a large margin.
arXiv Detail & Related papers (2024-06-13T03:33:36Z) - An Efficient Transformer for Simultaneous Learning of BEV and Lane
Representations in 3D Lane Detection [55.281369497158515]
We propose an efficient transformer for 3D lane detection.
Different from the vanilla transformer, our model contains a cross-attention mechanism to simultaneously learn lane and BEV representations.
Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively.
arXiv Detail & Related papers (2023-06-08T04:18:31Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - ONCE-3DLanes: Building Monocular 3D Lane Detection [41.46466150783367]
We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space.
By exploiting the explicit relationship between point clouds and image pixels, a dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations.
arXiv Detail & Related papers (2022-04-30T16:35:25Z) - PersFormer: 3D Lane Detection via Perspective Transformer and the
OpenLane Benchmark [109.03773439461615]
PersFormer is an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module.
We release one of the first large-scale real-world 3D lane datasets, called OpenLane, with high-quality annotation and scenario diversity.
arXiv Detail & Related papers (2022-03-21T16:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.