Related papers: A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions

A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions

URL: http://arxiv.org/abs/2403.07542v1
Date: Tue, 12 Mar 2024 11:29:40 GMT
Title: A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions
Authors: Quoc-Vinh Lai-Dang
Abstract summary: This survey explores the adaptation of visual transformer models in Autonomous Driving. It focuses on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. Survey concludes with future research directions, highlighting the growing role of Vision Transformers in Autonomous Driving.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This survey explores the adaptation of visual transformer models in Autonomous Driving, a transition inspired by their success in Natural Language Processing. Surpassing traditional Recurrent Neural Networks in tasks like sequential image processing and outperforming Convolutional Neural Networks in global context capture, as evidenced in complex scene recognition, Transformers are gaining traction in computer vision. These capabilities are crucial in Autonomous Driving for real-time, dynamic visual scene processing. Our survey provides a comprehensive overview of Vision Transformer applications in Autonomous Driving, focusing on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. We cover applications in object detection, segmentation, pedestrian detection, lane detection, and more, comparing their architectural merits and limitations. The survey concludes with future research directions, highlighting the growing role of Vision Transformers in Autonomous Driving.

Related papers

Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention [61.3281618482513]
We present CogDriving, a novel network designed for synthesizing high-quality multi-view driving videos. CogDriving leverages a Diffusion Transformer architecture with holistic-4D attention modules, enabling simultaneous associations across the dimensions. CogDriving demonstrates strong performance on the nuScenes validation set, achieving an FVD score of 37.8, highlighting its ability to generate realistic driving videos.
arXiv Detail & Related papers (2024-12-04T18:02:49Z)
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships [0.5639904484784127]
Transformer-based models have transformed the landscape of natural language processing (NLP) These models are renowned for their ability to capture long-range dependencies and contextual information. We discuss potential research directions and applications of transformer-based models in computer vision.
arXiv Detail & Related papers (2024-08-27T16:22:18Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
Learning Explicit Object-Centric Representations with Vision Transformers [81.38804205212425]
We build on the self-supervision task of masked autoencoding and explore its effectiveness for learning object-centric representations with transformers. We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.
arXiv Detail & Related papers (2022-10-25T16:39:49Z)
A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective [71.03621840455754]
Graph Neural Networks (GNNs) have gained momentum in graph representation learning. graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation. This paper presents a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective.
arXiv Detail & Related papers (2022-09-27T08:10:14Z)
Vision Transformers for Action Recognition: A Survey [41.69370782177517]
Vision transformers are emerging as a powerful tool to solve computer vision problems. Recent techniques have proven the efficacy of transformers beyond the image domain to solve numerous video-related tasks. Human action recognition is receiving special attention from the research community due to its widespread applications.
arXiv Detail & Related papers (2022-09-13T02:57:05Z)
Vision Transformers: State of the Art and Research Challenges [26.462994554165697]
This paper presents a comprehensive overview of the literature on different architecture designs and training tricks for vision transformers. Our goal is to provide a systematic review with the open research opportunities.
arXiv Detail & Related papers (2022-07-07T02:01:56Z)
Transformers in Medical Imaging: A Survey [88.03790310594533]
Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results. Medical imaging has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. We provide a review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues.
arXiv Detail & Related papers (2022-01-24T18:50:18Z)
Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence. Transformers require minimal inductive biases for their design and are naturally suited as set-functions. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z)
A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism. In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.