Related papers: V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models

URL: http://arxiv.org/abs/2502.09980v2
Date: Mon, 17 Feb 2025 19:34:15 GMT
Title: V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models
Authors: Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen,
Abstract summary: Vehicle-to-vehicle (V2V) communication has been proposed, but they have tended to focus on detection and tracking.<n>We propose a novel problem setting that integrates an Large Language Models (LLMs) into cooperative autonomous driving.<n>We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles.
Score: 31.537045261401666
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on detection and tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates an LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .

Related papers

A Survey on Vision-Language-Action Models for Autonomous Driving [26.407082158880204]
Vision-Language-Action (VLA) paradigms integrate visual perception, natural language understanding, and control within a single policy.<n>Researchers in autonomous driving are actively adapting these methods to the vehicle domain.<n>This survey offers the first comprehensive overview of VLA for Autonomous Driving.
arXiv Detail & Related papers (2025-06-30T16:50:02Z)
The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey [50.62538723793247]
Driving World Model (DWM) focuses on predicting scene evolution during the driving process. DWM methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments.
arXiv Detail & Related papers (2025-02-14T18:43:15Z)
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving [15.551625571158056]
We propose an e2eAD method called SimpleLLM4AD. In our method, the e2eAD task are divided into four stages, which are perception, prediction, planning, and behavior. Our experiments demonstrate that SimpleLLM4AD achieves competitive performance in complex driving scenarios.
arXiv Detail & Related papers (2024-07-31T02:35:33Z)
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving [0.0]
We develop an efficient, lightweight, multi-frame vision language model which performs Visual Question Answering for autonomous driving. In comparison to previous approaches, EM-VLM4AD requires at least 10 times less memory and floating point operations.
arXiv Detail & Related papers (2024-03-28T21:18:33Z)
M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [11.36165122994834]
We propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving. By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas precisely and ensure safety.
arXiv Detail & Related papers (2024-03-19T08:54:52Z)
DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems. We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z)
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators. We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system. This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z)
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [38.28159034562901]
Reason2Drive is a benchmark dataset with over 600K video-text pairs. We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps. We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
arXiv Detail & Related papers (2023-12-06T18:32:33Z)
LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers. In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv Detail & Related papers (2023-11-02T07:23:33Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
Learning Driver Models for Automated Vehicles via Knowledge Sharing and Personalization [2.07180164747172]
This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization. It finds several applications across transportation engineering including intelligent transportation systems, traffic management, and vehicle-to-vehicle communication.
arXiv Detail & Related papers (2023-08-31T17:18:15Z)
COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles [54.61668577827041]
We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving. Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate.
arXiv Detail & Related papers (2022-05-04T17:55:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.