Related papers: CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention

URL: http://arxiv.org/abs/2402.06423v1
Date: Fri, 9 Feb 2024 14:13:40 GMT
Title: CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention
Authors: Yifeng Bai, Zhirong Chen, Pengpeng Liang, Erkang Cheng
Abstract summary: We present CurveFormer++, a single-stage Transformer-based method that does not require the image feature view transform module. By employing a Transformer decoder, the model can iteratively refine the 3D lane detection results. We evaluate our approach for the 3D lane detection task on two publicly available real-world datasets.
Score: 6.337799395191661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In autonomous driving, 3D lane detection using monocular cameras is an important task for various downstream planning and control tasks. Recent CNN and Transformer approaches usually apply a two-stage scheme in the model design. The first stage transforms the image feature from a front image into a bird's-eye-view (BEV) representation. Subsequently, a sub-network processes the BEV feature map to generate the 3D detection results. However, these approaches heavily rely on a challenging image feature transformation module from a perspective view to a BEV representation. In our work, we present CurveFormer++, a single-stage Transformer-based method that does not require the image feature view transform module and directly infers 3D lane detection results from the perspective image features. Specifically, our approach models the 3D detection task as a curve propagation problem, where each lane is represented by a curve query with a dynamic and ordered anchor point set. By employing a Transformer decoder, the model can iteratively refine the 3D lane detection results. A curve cross-attention module is introduced in the Transformer decoder to calculate similarities between image features and curve queries of lanes. To handle varying lane lengths, we employ context sampling and anchor point restriction techniques to compute more relevant image features for a curve query. Furthermore, we apply a temporal fusion module that incorporates selected informative sparse curve queries and their corresponding anchor point sets to leverage historical lane information. In the experiments, we evaluate our approach for the 3D lane detection task on two publicly available real-world datasets. The results demonstrate that our method provides outstanding performance compared with both CNN and Transformer based methods. We also conduct ablation studies to analyze the impact of each component in our approach.

Related papers

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows [60.291277312569285]
We present a method for automatically modifying a NeRF representation based on a single observation. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations. We also introduce a new dataset for exploring the problem of modifying a NeRF scene through a single observation.
arXiv Detail & Related papers (2024-06-15T07:58:08Z)
Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training. Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks. We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z)
3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images. We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image. We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z)
Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z)
An Efficient Transformer for Simultaneous Learning of BEV and Lane Representations in 3D Lane Detection [55.281369497158515]
We propose an efficient transformer for 3D lane detection. Different from the vanilla transformer, our model contains a cross-attention mechanism to simultaneously learn lane and BEV representations. Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively.
arXiv Detail & Related papers (2023-06-08T04:18:31Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
CurveFormer: 3D Lane Detection by Curve Propagation with Curve Queries and Attention [3.330270927081078]
3D lane detection is an integral part of autonomous driving systems. Previous CNN and Transformer-based methods usually first generate a bird's-eye-view (BEV) feature map from the front view image. We propose CurveFormer, a single-stage Transformer-based method that directly calculates 3D lane parameters.
arXiv Detail & Related papers (2022-09-16T14:54:57Z)
Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses. We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network. Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z)
Rethinking Efficient Lane Detection via Curve Modeling [37.45243848960598]
The proposed method achieves a new state-of-the-art performance on the popular LLAMAS benchmark. It also achieves favorable accuracy on the TuSimple and CU datasets, while retaining both low latency (> 150 FPS) and small model size ( 10M)
arXiv Detail & Related papers (2022-03-04T17:00:33Z)
Probabilistic Vehicle Reconstruction Using a Multi-Task CNN [0.0]
We present a probabilistic approach for shape-aware 3D vehicle reconstruction from stereo images. Specifically, we train a CNN that outputs probability distributions for the vehicle's orientation and for both, vehicle keypoints and wireframe edges. We show that our method achieves state-of-the-art results, evaluating our method on the challenging KITTI benchmark.
arXiv Detail & Related papers (2021-02-21T20:45:44Z)
Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs. We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z)
DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data. The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes. We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.