Reason induced visual attention for explainable autonomous driving
- URL: http://arxiv.org/abs/2110.07380v1
- Date: Mon, 11 Oct 2021 18:50:41 GMT
- Title: Reason induced visual attention for explainable autonomous driving
- Authors: Sikai Chen, Jiqian Dong, Runjia Du, Yujie Li, Samuel Labi
- Abstract summary: Deep learning (DL) based computer vision (CV) models are generally considered as black boxes due to poor interpretability.
This study is motivated by the need to enhance the interpretability of DL model in autonomous driving.
The proposed framework imitates the learning process of human drivers by jointly modeling the visual input (images) and natural language.
- Score: 2.090380922731455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) based computer vision (CV) models are generally considered
as black boxes due to poor interpretability. This limitation impedes efficient
diagnoses or predictions of system failure, thereby precluding the widespread
deployment of DLCV models in safety-critical tasks such as autonomous driving.
This study is motivated by the need to enhance the interpretability of DL model
in autonomous driving and therefore proposes an explainable DL-based framework
that generates textual descriptions of the driving environment and makes
appropriate decisions based on the generated descriptions. The proposed
framework imitates the learning process of human drivers by jointly modeling
the visual input (images) and natural language, while using the language to
induce the visual attention in the image. The results indicate strong
explainability of autonomous driving decisions obtained by focusing on relevant
features from visual inputs. Furthermore, the output attention maps enhance the
interpretability of the model not only by providing meaningful explanation to
the model behavior but also by identifying the weakness of and potential
improvement directions for the model.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving [1.4104119587524289]
Recent advancements in autonomous driving have seen a paradigm shift towards end-to-end learning paradigms.
These models often sacrifice interpretability, posing significant challenges to trust, safety, and regulatory compliance.
We introduce DRIVE, a comprehensive framework designed to improve the dependability and stability of explanations in end-to-end unsupervised driving models.
arXiv Detail & Related papers (2024-09-16T14:40:47Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models [51.21351775178525]
DiffExplainer is a novel framework that, leveraging language-vision models, enables multimodal global explainability.
It employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs.
The analysis of generated visual descriptions allows for automatic identification of biases and spurious features.
arXiv Detail & Related papers (2024-04-03T10:11:22Z) - RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model [22.25903116720301]
explainability plays a critical role in trustworthy autonomous decision-making.
Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent.
We present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving.
arXiv Detail & Related papers (2024-02-16T16:57:18Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - Driving through the Concept Gridlock: Unraveling Explainability
Bottlenecks in Automated Driving [22.21822829138535]
We propose a new approach using concept bottlenecks as visual features for control command predictions and explanations of user and vehicle behavior.
We learn a human-understandable concept layer that we use to explain sequential driving scenes while learning vehicle control commands.
This approach can then be used to determine whether a change in a preferred gap or steering commands from a human (or autonomous vehicle) is led by an external stimulus or change in preferences.
arXiv Detail & Related papers (2023-10-25T13:39:04Z) - Development and testing of an image transformer for explainable
autonomous driving systems [0.7046417074932257]
Deep learning (DL) approaches have been used successfully in computer vision (CV) applications.
DL-based CV models are generally considered to be black boxes due to their lack of interpretability.
We propose an explainable end-to-end autonomous driving system based on "Transformer", a state-of-the-art (SOTA) self-attention based model.
arXiv Detail & Related papers (2021-10-11T19:01:41Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Explaining Autonomous Driving by Learning End-to-End Visual Attention [25.09407072098823]
Current deep learning based autonomous driving approaches yield impressive results also leading to in-production deployment in certain controlled scenarios.
One of the most popular and fascinating approaches relies on learning vehicle controls directly from data perceived by sensors.
The main drawback of this approach as also in other learning problems is the lack of explainability. Indeed, a deep network will act as a black-box outputting predictions depending on previously seen driving patterns without giving any feedback on why such decisions were taken.
arXiv Detail & Related papers (2020-06-05T10:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.