V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models
- URL: http://arxiv.org/abs/2408.09251v2
- Date: Mon, 16 Sep 2024 05:23:07 GMT
- Title: V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models
- Authors: Junwei You, Haotian Shi, Zhuoyu Jiang, Zilin Huang, Rui Gan, Keshu Wu, Xi Cheng, Xiaopeng Li, Bin Ran,
- Abstract summary: This paper introduces V2X-VLM, an innovative E2E vehicle-infrastructure cooperative autonomous driving (VICAD) framework with Vehicle-to-Everything (V2X) systems and large vision-language models (VLMs)
V2X-VLM is designed to enhance situational awareness, decision-making, and ultimate trajectory planning by integrating multimodel data from vehicle-mounted cameras, infrastructure sensors, and textual information.
Evaluations on the DAIR-V2X dataset show that V2X-VLM outperforms state-of-the-art cooperative autonomous driving methods.
- Score: 13.716889927164383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advancements in autonomous driving have increasingly focused on end-to-end (E2E) systems that manage the full spectrum of driving tasks, from environmental perception to vehicle navigation and control. This paper introduces V2X-VLM, an innovative E2E vehicle-infrastructure cooperative autonomous driving (VICAD) framework with Vehicle-to-Everything (V2X) systems and large vision-language models (VLMs). V2X-VLM is designed to enhance situational awareness, decision-making, and ultimate trajectory planning by integrating multimodel data from vehicle-mounted cameras, infrastructure sensors, and textual information. The contrastive learning method is further employed to complement VLM by refining feature discrimination, assisting the model to learn robust representations of the driving environment. Evaluations on the DAIR-V2X dataset show that V2X-VLM outperforms state-of-the-art cooperative autonomous driving methods, while additional tests on corner cases validate its robustness in real-world driving conditions.
Related papers
- LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets.
Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples.
Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z) - Hybrid-Generative Diffusion Models for Attack-Oriented Twin Migration in Vehicular Metaverses [58.264499654343226]
Vehicle Twins (VTs) are digital twins that provide immersive virtual services for Vehicular Metaverse Users (VMUs)
High mobility of vehicles, uneven deployment of edge servers, and potential security threats pose challenges to achieving efficient and reliable VT migrations.
We propose a secure and reliable VT migration framework in vehicular metaverses.
arXiv Detail & Related papers (2024-07-05T11:11:33Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving.
Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - Unified End-to-End V2X Cooperative Autonomous Driving [21.631099800753795]
UniE2EV2X is a V2X-integrated end-to-end autonomous driving system that consolidates key driving modules within a unified network.
The framework employs a deformable attention-based data fusion strategy, effectively facilitating cooperation between vehicles and infrastructure.
We implement the UniE2EV2X framework on the challenging DeepAccident, a simulation dataset designed for V2X cooperative driving.
arXiv Detail & Related papers (2024-05-07T03:01:40Z) - DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models [31.552397390480525]
We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs)
DriveVLM integrates a unique combination of reasoning modules for scene description, scene analysis, and hierarchical planning.
We propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline.
arXiv Detail & Related papers (2024-02-19T17:04:04Z) - V2X-Lead: LiDAR-based End-to-End Autonomous Driving with
Vehicle-to-Everything Communication Integration [4.166623313248682]
This paper presents a LiDAR-based end-to-end autonomous driving method with Vehicle-to-Everything (V2X) communication integration.
The proposed method aims to handle imperfect partial observations by fusing the onboard LiDAR sensor and V2X communication data.
arXiv Detail & Related papers (2023-09-26T20:26:03Z) - Generative AI-empowered Simulation for Autonomous Driving in Vehicular
Mixed Reality Metaverses [130.15554653948897]
In vehicular mixed reality (MR) Metaverse, distance between physical and virtual entities can be overcome.
Large-scale traffic and driving simulation via realistic data collection and fusion from the physical world is difficult and costly.
We propose an autonomous driving architecture, where generative AI is leveraged to synthesize unlimited conditioned traffic and driving data in simulations.
arXiv Detail & Related papers (2023-02-16T16:54:10Z) - COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked
Vehicles [54.61668577827041]
We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving.
Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate.
arXiv Detail & Related papers (2022-05-04T17:55:12Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z) - V2X-Sim: A Virtual Collaborative Perception Dataset for Autonomous
Driving [26.961213523096948]
Vehicle-to-everything (V2X) denotes the collaboration between a vehicle and any entity in its surrounding.
We present the V2X-Sim dataset, the first public large-scale collaborative perception dataset in autonomous driving.
arXiv Detail & Related papers (2022-02-17T05:14:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.