Reason induced visual attention for explainable autonomous driving
- URL: http://arxiv.org/abs/2110.07380v1
- Date: Mon, 11 Oct 2021 18:50:41 GMT
- Title: Reason induced visual attention for explainable autonomous driving
- Authors: Sikai Chen, Jiqian Dong, Runjia Du, Yujie Li, Samuel Labi
- Abstract summary: Deep learning (DL) based computer vision (CV) models are generally considered as black boxes due to poor interpretability.
This study is motivated by the need to enhance the interpretability of DL model in autonomous driving.
The proposed framework imitates the learning process of human drivers by jointly modeling the visual input (images) and natural language.
- Score: 2.090380922731455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) based computer vision (CV) models are generally considered
as black boxes due to poor interpretability. This limitation impedes efficient
diagnoses or predictions of system failure, thereby precluding the widespread
deployment of DLCV models in safety-critical tasks such as autonomous driving.
This study is motivated by the need to enhance the interpretability of DL model
in autonomous driving and therefore proposes an explainable DL-based framework
that generates textual descriptions of the driving environment and makes
appropriate decisions based on the generated descriptions. The proposed
framework imitates the learning process of human drivers by jointly modeling
the visual input (images) and natural language, while using the language to
induce the visual attention in the image. The results indicate strong
explainability of autonomous driving decisions obtained by focusing on relevant
features from visual inputs. Furthermore, the output attention maps enhance the
interpretability of the model not only by providing meaningful explanation to
the model behavior but also by identifying the weakness of and potential
improvement directions for the model.
Related papers
- Exploring the Causality of End-to-End Autonomous Driving [57.631400236930375]
We propose a comprehensive approach to explore and analyze the causality of end-to-end autonomous driving.
Our work is the first to unveil the mystery of end-to-end autonomous driving and turn the black box into a white one.
arXiv Detail & Related papers (2024-07-09T04:56:11Z) - Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models [51.21351775178525]
DiffExplainer is a novel framework that, leveraging language-vision models, enables multimodal global explainability.
It employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs.
The analysis of generated visual descriptions allows for automatic identification of biases and spurious features.
arXiv Detail & Related papers (2024-04-03T10:11:22Z) - RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model [22.25903116720301]
explainability plays a critical role in trustworthy autonomous decision-making.
Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent.
We present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving.
arXiv Detail & Related papers (2024-02-16T16:57:18Z) - Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [38.28159034562901]
Reason2Drive is a benchmark dataset with over 600K video-text pairs.
We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps.
We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
arXiv Detail & Related papers (2023-12-06T18:32:33Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [67.843551583229]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - Driving through the Concept Gridlock: Unraveling Explainability
Bottlenecks in Automated Driving [22.21822829138535]
We propose a new approach using concept bottlenecks as visual features for control command predictions and explanations of user and vehicle behavior.
We learn a human-understandable concept layer that we use to explain sequential driving scenes while learning vehicle control commands.
This approach can then be used to determine whether a change in a preferred gap or steering commands from a human (or autonomous vehicle) is led by an external stimulus or change in preferences.
arXiv Detail & Related papers (2023-10-25T13:39:04Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Development and testing of an image transformer for explainable
autonomous driving systems [0.7046417074932257]
Deep learning (DL) approaches have been used successfully in computer vision (CV) applications.
DL-based CV models are generally considered to be black boxes due to their lack of interpretability.
We propose an explainable end-to-end autonomous driving system based on "Transformer", a state-of-the-art (SOTA) self-attention based model.
arXiv Detail & Related papers (2021-10-11T19:01:41Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Explaining Autonomous Driving by Learning End-to-End Visual Attention [25.09407072098823]
Current deep learning based autonomous driving approaches yield impressive results also leading to in-production deployment in certain controlled scenarios.
One of the most popular and fascinating approaches relies on learning vehicle controls directly from data perceived by sensors.
The main drawback of this approach as also in other learning problems is the lack of explainability. Indeed, a deep network will act as a black-box outputting predictions depending on previously seen driving patterns without giving any feedback on why such decisions were taken.
arXiv Detail & Related papers (2020-06-05T10:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.