Related papers: Development and testing of an image transformer for explainable autonomous driving systems

Development and testing of an image transformer for explainable autonomous driving systems

URL: http://arxiv.org/abs/2110.05559v1
Date: Mon, 11 Oct 2021 19:01:41 GMT
Title: Development and testing of an image transformer for explainable autonomous driving systems
Authors: Jiqian Dong, Sikai Chen, Shuya Zong, Tiantian Chen, Mohammad Miralinaghi, Samuel Labi
Abstract summary: Deep learning (DL) approaches have been used successfully in computer vision (CV) applications. DL-based CV models are generally considered to be black boxes due to their lack of interpretability. We propose an explainable end-to-end autonomous driving system based on "Transformer", a state-of-the-art (SOTA) self-attention based model.
Score: 0.7046417074932257
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the last decade, deep learning (DL) approaches have been used successfully in computer vision (CV) applications. However, DL-based CV models are generally considered to be black boxes due to their lack of interpretability. This black box behavior has exacerbated user distrust and therefore has prevented widespread deployment DLCV models in autonomous driving tasks even though some of these models exhibit superiority over human performance. For this reason, it is essential to develop explainable DL models for autonomous driving task. Explainable DL models can not only boost user trust in autonomy but also serve as a diagnostic approach to identify anydefects and weaknesses of the model during the system development phase. In this paper, we propose an explainable end-to-end autonomous driving system based on "Transformer", a state-of-the-art (SOTA) self-attention based model, to map visual features from images collected by onboard cameras to guide potential driving actions with corresponding explanations. The model achieves a soft attention over the global features of the image. The results demonstrate the efficacy of our proposed model as it exhibits superior performance (in terms of correct prediction of actions and explanations) compared to the benchmark model by a significant margin with lower computational cost.

Related papers

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models [89.44024245194315]
We introduce a method that incorporates explicit visual chain-of-thought (CoT) reasoning into vision-language-action models (VLAs) We introduce CoT-VLA, a state-of-the-art 7B VLA that can understand and generate visual and action tokens. Our experimental results demonstrate that CoT-VLA achieves strong performance, outperforming the state-of-the-art VLA model by 17% in real-world manipulation tasks and 6% in simulation benchmarks.
arXiv Detail & Related papers (2025-03-27T22:23:04Z)
VACT: A Video Automatic Causal Testing System and a Benchmark [55.53300306960048]
VACT is an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios. We introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs.
arXiv Detail & Related papers (2025-03-08T10:54:42Z)
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision [20.43366384946928]
Vision-language models (VLMs) as teachers to enhance training. VLM-AD achieves significant improvements in planning accuracy and reduced collision rates on the nuScenes dataset.
arXiv Detail & Related papers (2024-12-19T01:53:36Z)
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving [1.4104119587524289]
Recent advancements in autonomous driving have seen a paradigm shift towards end-to-end learning paradigms. These models often sacrifice interpretability, posing significant challenges to trust, safety, and regulatory compliance. We introduce DRIVE, a comprehensive framework designed to improve the dependability and stability of explanations in end-to-end unsupervised driving models.
arXiv Detail & Related papers (2024-09-16T14:40:47Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving. We study how to guide the attention of these models to improve their driving quality by adding a loss term during training. In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving [65.04871316921327]
This paper introduces a new autonomous driving system that enhances the performance and reliability of autonomous driving system. DME-Driver utilizes a powerful vision language model as the decision-maker and a planning-oriented perception model as the control signal generator. By leveraging this dataset, our model achieves high-precision planning accuracy through a logical thinking process.
arXiv Detail & Related papers (2024-01-08T03:06:02Z)
Model-Based Reinforcement Learning with Isolated Imaginations [61.67183143982074]
We propose Iso-Dream++, a model-based reinforcement learning approach. We perform policy optimization based on the decoupled latent imaginations. This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild.
arXiv Detail & Related papers (2023-03-27T02:55:56Z)
Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models. E-ARM takes advantage of a well-designed energy-based learning objective. We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z)
Reason induced visual attention for explainable autonomous driving [2.090380922731455]
Deep learning (DL) based computer vision (CV) models are generally considered as black boxes due to poor interpretability. This study is motivated by the need to enhance the interpretability of DL model in autonomous driving. The proposed framework imitates the learning process of human drivers by jointly modeling the visual input (images) and natural language.
arXiv Detail & Related papers (2021-10-11T18:50:41Z)
Action-Based Representation Learning for Autonomous Driving [8.296684637620551]
We propose to use action-based driving data for learning representations. Our experiments show that an affordance-based driving model pre-trained with this approach can leverage a relatively small amount of weakly annotated imagery.
arXiv Detail & Related papers (2020-08-21T10:49:13Z)
An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset [7.151393153761375]
This paper introduces an approach to learn a short-term memory (LSTM)-based model for imitating the behavior of a self-driving model. The experimental results show that our model outperforms several models in driving action prediction.
arXiv Detail & Related papers (2020-02-14T05:28:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.