V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with   Multi-Modal Large Language Models
        - URL: http://arxiv.org/abs/2502.09980v2
 - Date: Mon, 17 Feb 2025 19:34:15 GMT
 - Title: V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with   Multi-Modal Large Language Models
 - Authors: Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Stephen F. Smith, Yu-Chiang Frank Wang, Min-Hung Chen, 
 - Abstract summary: Vehicle-to-vehicle (V2V) communication has been proposed, but they have tended to focus on detection and tracking.<n>We propose a novel problem setting that integrates an Large Language Models (LLMs) into cooperative autonomous driving.<n>We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles.
 - Score: 31.537045261401666
 - License: http://creativecommons.org/licenses/by-nc-nd/4.0/
 - Abstract:   Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on detection and tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates an LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ . 
 
       
      
        Related papers
        - A Survey on Vision-Language-Action Models for Autonomous Driving [26.407082158880204]
Vision-Language-Action (VLA) paradigms integrate visual perception, natural language understanding, and control within a single policy.<n>Researchers in autonomous driving are actively adapting these methods to the vehicle domain.<n>This survey offers the first comprehensive overview of VLA for Autonomous Driving.
arXiv  Detail & Related papers  (2025-06-30T16:50:02Z) - The Role of World Models in Shaping Autonomous Driving: A Comprehensive   Survey [50.62538723793247]
Driving World Model (DWM) focuses on predicting scene evolution during the driving process.
DWM methods enable autonomous driving systems to better perceive, understand, and interact with dynamic driving environments.
arXiv  Detail & Related papers  (2025-02-14T18:43:15Z) - SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual   Question Answering for Autonomous Driving [15.551625571158056]
We propose an e2eAD method called SimpleLLM4AD.
In our method, the e2eAD task are divided into four stages, which are perception, prediction, planning, and behavior.
Our experiments demonstrate that SimpleLLM4AD achieves competitive performance in complex driving scenarios.
arXiv  Detail & Related papers  (2024-07-31T02:35:33Z) - Multi-Frame, Lightweight & Efficient Vision-Language Models for Question   Answering in Autonomous Driving [0.0]
We develop an efficient, lightweight, multi-frame vision language model which performs Visual Question Answering for autonomous driving.
In comparison to previous approaches, EM-VLM4AD requires at least 10 times less memory and floating point operations.
arXiv  Detail & Related papers  (2024-03-28T21:18:33Z) - M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for   Autonomous Driving [11.36165122994834]
We propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving.
By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas precisely and ensure safety.
arXiv  Detail & Related papers  (2024-03-19T08:54:52Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.
We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv  Detail & Related papers  (2023-12-21T18:59:12Z) - DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
  Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators.
We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system.
This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv  Detail & Related papers  (2023-12-14T18:59:05Z) - Reason2Drive: Towards Interpretable and Chain-based Reasoning for   Autonomous Driving [38.28159034562901]
Reason2Drive is a benchmark dataset with over 600K video-text pairs.
We characterize the autonomous driving process as a sequential combination of perception, prediction, and reasoning steps.
We introduce a novel aggregated evaluation metric to assess chain-based reasoning performance in autonomous systems.
arXiv  Detail & Related papers  (2023-12-06T18:32:33Z) - LLM4Drive: A Survey of Large Language Models for Autonomous Driving [62.10344445241105]
Large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers.
In this paper, we systematically review a research line about textitLarge Language Models for Autonomous Driving (LLM4AD).
arXiv  Detail & Related papers  (2023-11-02T07:23:33Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
  Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv  Detail & Related papers  (2023-10-26T17:56:35Z) - Learning Driver Models for Automated Vehicles via Knowledge Sharing and
  Personalization [2.07180164747172]
This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization.
It finds several applications across transportation engineering including intelligent transportation systems, traffic management, and vehicle-to-vehicle communication.
arXiv  Detail & Related papers  (2023-08-31T17:18:15Z) - COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked
  Vehicles [54.61668577827041]
We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving.
Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate.
arXiv  Detail & Related papers  (2022-05-04T17:55:12Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.