Related papers: Reason induced visual attention for explainable autonomous driving

Reason induced visual attention for explainable autonomous driving

URL: http://arxiv.org/abs/2110.07380v1
Date: Mon, 11 Oct 2021 18:50:41 GMT
Title: Reason induced visual attention for explainable autonomous driving
Authors: Sikai Chen, Jiqian Dong, Runjia Du, Yujie Li, Samuel Labi
Abstract summary: Deep learning (DL) based computer vision (CV) models are generally considered as black boxes due to poor interpretability. This study is motivated by the need to enhance the interpretability of DL model in autonomous driving. The proposed framework imitates the learning process of human drivers by jointly modeling the visual input (images) and natural language.
Score: 2.090380922731455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning (DL) based computer vision (CV) models are generally considered as black boxes due to poor interpretability. This limitation impedes efficient diagnoses or predictions of system failure, thereby precluding the widespread deployment of DLCV models in safety-critical tasks such as autonomous driving. This study is motivated by the need to enhance the interpretability of DL model in autonomous driving and therefore proposes an explainable DL-based framework that generates textual descriptions of the driving environment and makes appropriate decisions based on the generated descriptions. The proposed framework imitates the learning process of human drivers by jointly modeling the visual input (images) and natural language, while using the language to induce the visual attention in the image. The results indicate strong explainability of autonomous driving decisions obtained by focusing on relevant features from visual inputs. Furthermore, the output attention maps enhance the interpretability of the model not only by providing meaningful explanation to the model behavior but also by identifying the weakness of and potential improvement directions for the model.

Related papers

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models [89.44024245194315]
We introduce a method that incorporates explicit visual chain-of-thought (CoT) reasoning into vision-language-action models (VLAs) We introduce CoT-VLA, a state-of-the-art 7B VLA that can understand and generate visual and action tokens. Our experimental results demonstrate that CoT-VLA achieves strong performance, outperforming the state-of-the-art VLA model by 17% in real-world manipulation tasks and 6% in simulation benchmarks.
arXiv Detail & Related papers (2025-03-27T22:23:04Z)
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models [5.858709357808136]
Visual Question Answering (VQA) models execute multiple downsampling processes on image inputs to strike a balance between computational efficiency and model performance. Downsampling can lead to an inadequate capture of distant or small objects such as pedestrians, road signs, or obstacles. This loss of features negatively impacts an autonomous driving system's capacity to accurately perceive the environment.
arXiv Detail & Related papers (2025-03-14T10:19:24Z)
VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision [20.43366384946928]
Vision-language models (VLMs) as teachers to enhance training. VLM-AD achieves significant improvements in planning accuracy and reduced collision rates on the nuScenes dataset.
arXiv Detail & Related papers (2024-12-19T01:53:36Z)
Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving [6.873701251194593]
We propose a reasoning model that takes future planning trajectories of the ego vehicle as inputs to solve this limitation. In this study, we propose a reasoning model that takes future planning trajectories of the ego vehicle as inputs to solve this limitation with the dataset newly collected.
arXiv Detail & Related papers (2024-11-15T06:05:33Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving [1.4104119587524289]
Recent advancements in autonomous driving have seen a paradigm shift towards end-to-end learning paradigms. These models often sacrifice interpretability, posing significant challenges to trust, safety, and regulatory compliance. We introduce DRIVE, a comprehensive framework designed to improve the dependability and stability of explanations in end-to-end unsupervised driving models.
arXiv Detail & Related papers (2024-09-16T14:40:47Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving. We study how to guide the attention of these models to improve their driving quality by adding a loss term during training. In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z)
Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models [51.21351775178525]
DiffExplainer is a novel framework that, leveraging language-vision models, enables multimodal global explainability. It employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs. The analysis of generated visual descriptions allows for automatic identification of biases and spurious features.
arXiv Detail & Related papers (2024-04-03T10:11:22Z)
LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers. In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z)
Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving [22.21822829138535]
We propose a new approach using concept bottlenecks as visual features for control command predictions and explanations of user and vehicle behavior. We learn a human-understandable concept layer that we use to explain sequential driving scenes while learning vehicle control commands. This approach can then be used to determine whether a change in a preferred gap or steering commands from a human (or autonomous vehicle) is led by an external stimulus or change in preferences.
arXiv Detail & Related papers (2023-10-25T13:39:04Z)
Development and testing of an image transformer for explainable autonomous driving systems [0.7046417074932257]
Deep learning (DL) approaches have been used successfully in computer vision (CV) applications. DL-based CV models are generally considered to be black boxes due to their lack of interpretability. We propose an explainable end-to-end autonomous driving system based on "Transformer", a state-of-the-art (SOTA) self-attention based model.
arXiv Detail & Related papers (2021-10-11T19:01:41Z)
Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI) PPI leverages proactive interventions to guard against image features with no causal relevance. We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
Explaining Autonomous Driving by Learning End-to-End Visual Attention [25.09407072098823]
Current deep learning based autonomous driving approaches yield impressive results also leading to in-production deployment in certain controlled scenarios. One of the most popular and fascinating approaches relies on learning vehicle controls directly from data perceived by sensors. The main drawback of this approach as also in other learning problems is the lack of explainability. Indeed, a deep network will act as a black-box outputting predictions depending on previously seen driving patterns without giving any feedback on why such decisions were taken.
arXiv Detail & Related papers (2020-06-05T10:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.