CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal
Curve Queries and Attention
- URL: http://arxiv.org/abs/2402.06423v1
- Date: Fri, 9 Feb 2024 14:13:40 GMT
- Title: CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal
Curve Queries and Attention
- Authors: Yifeng Bai, Zhirong Chen, Pengpeng Liang, Erkang Cheng
- Abstract summary: We present CurveFormer++, a single-stage Transformer-based method that does not require the image feature view transform module.
By employing a Transformer decoder, the model can iteratively refine the 3D lane detection results.
We evaluate our approach for the 3D lane detection task on two publicly available real-world datasets.
- Score: 6.337799395191661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In autonomous driving, 3D lane detection using monocular cameras is an
important task for various downstream planning and control tasks. Recent CNN
and Transformer approaches usually apply a two-stage scheme in the model
design. The first stage transforms the image feature from a front image into a
bird's-eye-view (BEV) representation. Subsequently, a sub-network processes the
BEV feature map to generate the 3D detection results. However, these approaches
heavily rely on a challenging image feature transformation module from a
perspective view to a BEV representation. In our work, we present
CurveFormer++, a single-stage Transformer-based method that does not require
the image feature view transform module and directly infers 3D lane detection
results from the perspective image features. Specifically, our approach models
the 3D detection task as a curve propagation problem, where each lane is
represented by a curve query with a dynamic and ordered anchor point set. By
employing a Transformer decoder, the model can iteratively refine the 3D lane
detection results. A curve cross-attention module is introduced in the
Transformer decoder to calculate similarities between image features and curve
queries of lanes. To handle varying lane lengths, we employ context sampling
and anchor point restriction techniques to compute more relevant image features
for a curve query. Furthermore, we apply a temporal fusion module that
incorporates selected informative sparse curve queries and their corresponding
anchor point sets to leverage historical lane information. In the experiments,
we evaluate our approach for the 3D lane detection task on two publicly
available real-world datasets. The results demonstrate that our method provides
outstanding performance compared with both CNN and Transformer based methods.
We also conduct ablation studies to analyze the impact of each component in our
approach.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation.
We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view.
We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z) - An Efficient Transformer for Simultaneous Learning of BEV and Lane
Representations in 3D Lane Detection [55.281369497158515]
We propose an efficient transformer for 3D lane detection.
Different from the vanilla transformer, our model contains a cross-attention mechanism to simultaneously learn lane and BEV representations.
Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively.
arXiv Detail & Related papers (2023-06-08T04:18:31Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - CurveFormer: 3D Lane Detection by Curve Propagation with Curve Queries
and Attention [3.330270927081078]
3D lane detection is an integral part of autonomous driving systems.
Previous CNN and Transformer-based methods usually first generate a bird's-eye-view (BEV) feature map from the front view image.
We propose CurveFormer, a single-stage Transformer-based method that directly calculates 3D lane parameters.
arXiv Detail & Related papers (2022-09-16T14:54:57Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - Probabilistic Vehicle Reconstruction Using a Multi-Task CNN [0.0]
We present a probabilistic approach for shape-aware 3D vehicle reconstruction from stereo images.
Specifically, we train a CNN that outputs probability distributions for the vehicle's orientation and for both, vehicle keypoints and wireframe edges.
We show that our method achieves state-of-the-art results, evaluating our method on the challenging KITTI benchmark.
arXiv Detail & Related papers (2021-02-21T20:45:44Z) - Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs.
We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z) - DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data.
The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes.
We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.