Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
- URL: http://arxiv.org/abs/2312.03661v3
- Date: Sat, 20 Jul 2024 15:59:53 GMT
- Title: Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
- Authors: Ming Nie, Renyuan Peng, Chunwei Wang, Xinyue Cai, Jianhua Han, Hang Xu, Li Zhang,
- Abstract summary: Reason2Drive is a benchmark dataset with over 600K video-text pairs.
We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps.
We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
- Score: 38.28159034562901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large vision-language models (VLMs) have garnered increasing interest in autonomous driving areas, due to their advanced capabilities in complex reasoning tasks essential for highly autonomous vehicle behavior. Despite their potential, research in autonomous systems is hindered by the lack of datasets with annotated reasoning chains that explain the decision-making processes in driving. To bridge this gap, we present Reason2Drive, a benchmark dataset with over 600K video-text pairs, aimed at facilitating the study of interpretable reasoning in complex driving environments. We distinctly characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps, and the question-answer pairs are automatically collected from a diverse range of open-source outdoor driving datasets, including nuScenes, Waymo and ONCE. Moreover, we introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems, addressing the semantic ambiguities of existing metrics such as BLEU and CIDEr. Based on the proposed benchmark, we conduct experiments to assess various existing VLMs, revealing insights into their reasoning capabilities. Additionally, we develop an efficient approach to empower VLMs to leverage object-level perceptual elements in both feature extraction and prediction, further enhancing their reasoning accuracy. The code and dataset will be released.
Related papers
- Exploring the Causality of End-to-End Autonomous Driving [57.631400236930375]
We propose a comprehensive approach to explore and analyze the causality of end-to-end autonomous driving.
Our work is the first to unveil the mystery of end-to-end autonomous driving and turn the black box into a white one.
arXiv Detail & Related papers (2024-07-09T04:56:11Z) - DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving [81.04174379726251]
This paper collects a comprehensive end-to-end driving dataset named DriveCoT.
It contains sensor data, control decisions, and chain-of-thought labels to indicate the reasoning process.
We propose a baseline model called DriveCoT-Agent, trained on our dataset, to generate chain-of-thought predictions and final decisions.
arXiv Detail & Related papers (2024-03-25T17:59:01Z) - Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving [14.64475022650084]
Large Language Models (LLMs) have garnered significant attention for their ability to understand text and images, generate human-like text, and perform complex reasoning tasks.
We investigate how well LLMs can adapt and apply a combination of arithmetic and common-sense reasoning, particularly in autonomous driving scenarios.
arXiv Detail & Related papers (2024-02-21T08:09:05Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z) - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs)
DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4.
arXiv Detail & Related papers (2023-10-02T17:59:52Z) - End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others.
We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z) - AutoFed: Heterogeneity-Aware Federated Multimodal Learning for Robust
Autonomous Driving [15.486799633600423]
AutoFed is a framework to fully exploit multimodal sensory data on autonomous vehicles.
We propose a novel model leveraging pseudo-labeling to avoid mistakenly treating unlabeled objects as the background.
We also propose an autoencoder-based data imputation method to fill missing data modality.
arXiv Detail & Related papers (2023-02-17T01:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.