Laneformer: Object-aware Row-Column Transformers for Lane Detection
- URL: http://arxiv.org/abs/2203.09830v1
- Date: Fri, 18 Mar 2022 10:14:35 GMT
- Title: Laneformer: Object-aware Row-Column Transformers for Lane Detection
- Authors: Jianhua Han, Xiajun Deng, Xinyue Cai, Zhen Yang, Hang Xu, Chunjing Xu,
Xiaodan Liang
- Abstract summary: Laneformer is a transformer-based architecture tailored for lane detection in autonomous driving.
Inspired by recent advances of the transformer encoder-decoder architecture in various vision tasks, we move forwards to design a new end-to-end Laneformer architecture.
- Score: 96.62919884511287
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present Laneformer, a conceptually simple yet powerful transformer-based
architecture tailored for lane detection that is a long-standing research topic
for visual perception in autonomous driving. The dominant paradigms rely on
purely CNN-based architectures which often fail in incorporating relations of
long-range lane points and global contexts induced by surrounding objects
(e.g., pedestrians, vehicles). Inspired by recent advances of the transformer
encoder-decoder architecture in various vision tasks, we move forwards to
design a new end-to-end Laneformer architecture that revolutionizes the
conventional transformers into better capturing the shape and semantic
characteristics of lanes, with minimal overhead in latency. First, coupling
with deformable pixel-wise self-attention in the encoder, Laneformer presents
two new row and column self-attention operations to efficiently mine point
context along with the lane shapes. Second, motivated by the appearing objects
would affect the decision of predicting lane segments, Laneformer further
includes the detected object instances as extra inputs of multi-head attention
blocks in the encoder and decoder to facilitate the lane point detection by
sensing semantic contexts. Specifically, the bounding box locations of objects
are added into Key module to provide interaction with each pixel and query
while the ROI-aligned features are inserted into Value module. Extensive
experiments demonstrate our Laneformer achieves state-of-the-art performances
on CULane benchmark, in terms of 77.1% F1 score. We hope our simple and
effective Laneformer will serve as a strong baseline for future research in
self-attention models for lane detection.
Related papers
- Monocular Lane Detection Based on Deep Learning: A Survey [51.19079381823076]
Lane detection plays an important role in autonomous driving perception systems.
As deep learning algorithms gain popularity, monocular lane detection methods based on deep learning have demonstrated superior performance.
This paper presents a comprehensive overview of existing methods, encompassing both the increasingly mature 2D lane detection approaches and the developing 3D lane detection works.
arXiv Detail & Related papers (2024-11-25T12:09:43Z) - LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation [87.71768494466959]
LaneTCA bridges the individual video frames and explore how to effectively aggregate the temporal context.
We develop an accumulative attention module and an adjacent attention module to abstract the long-term and short-term temporal context.
The two modules are meticulously designed based on the transformer architecture.
arXiv Detail & Related papers (2024-08-25T14:46:29Z) - ENet-21: An Optimized light CNN Structure for Lane Detection [1.4542411354617986]
This study develops an optimal structure for the lane detection problem.
It offers a promising solution for driver assistance features in modern vehicles.
Experiments on the TuSimple dataset support the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-03-28T19:07:26Z) - LDTR: Transformer-based Lane Detection with Anchor-chain Representation [11.184960972042406]
Lane detection scenarios with limited- or no-visual-clue of lanes remain challenging and crucial for automated driving.
Inspired by the DETR architecture, we propose LDTR, a transformer-based model to address these issues.
Experimental results demonstrate that LDTR achieves state-of-the-art performance on well-known datasets.
arXiv Detail & Related papers (2024-03-21T12:29:26Z) - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS)
Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos.
Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z) - HoughLaneNet: Lane Detection with Deep Hough Transform and Dynamic
Convolution [8.97991745734826]
Lanes can present difficulties for detection, as they can be narrow, fragmented, and often obscured by heavy traffic.
We propose a hierarchical Deep Hough Transform (DHT) approach that combines all lane features in an image into the Hough parameter space.
Our proposed network structure demonstrates improved performance in detecting heavily occluded or worn lane images.
arXiv Detail & Related papers (2023-07-07T10:08:29Z) - Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation.
Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles.
We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z) - Lane Detection with Versatile AtrousFormer and Local Semantic Guidance [92.83267435275802]
Lane detection is one of the core functions in autonomous driving.
Most existing methods tend to resort to CNN-based techniques.
We propose Atrous Transformer (AtrousFormer) to solve the problem.
arXiv Detail & Related papers (2022-03-08T13:25:35Z) - End-to-end Lane Shape Prediction with Transformers [13.103463647059634]
Lane detection is widely used for lane departure warning and adaptive cruise control in autonomous vehicles.
We propose an end-to-end method that directly outputs parameters of a lane shape model.
The proposed method is validated on the TuSimple benchmark and shows state-of-the-art accuracy with the most lightweight model size and fastest speed.
arXiv Detail & Related papers (2020-11-09T07:42:55Z) - Lane Detection Model Based on Spatio-Temporal Network With Double
Convolutional Gated Recurrent Units [11.968518335236787]
Lane detection will remain an open problem for some time to come.
A-temporal network with double Conal Gated Recurrent Units (ConvGRUs) proposed to address lane detection in challenging scenes.
Our model can outperform the state-of-the-art lane detection models.
arXiv Detail & Related papers (2020-08-10T06:50:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.