Related papers: V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

URL: http://arxiv.org/abs/2203.10638v1
Date: Sun, 20 Mar 2022 20:18:25 GMT
Title: V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
Authors: Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma
Abstract summary: We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents. V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention. To validate our approach, we create a large-scale V2X perception dataset.
Score: 58.71845618090022
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles. We present a robust cooperative perception framework with V2X communication using a novel vision Transformer. Specifically, we build a holistic attention model, namely V2X-ViT, to effectively fuse information across on-road agents (i.e., vehicles and infrastructure). V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention, which captures inter-agent interaction and per-agent spatial relationships. These key modules are designed in a unified Transformer architecture to handle common V2X challenges, including asynchronous information sharing, pose errors, and heterogeneity of V2X components. To validate our approach, we create a large-scale V2X perception dataset using CARLA and OpenCDA. Extensive experimental results demonstrate that V2X-ViT sets new state-of-the-art performance for 3D object detection and achieves robust performance even under harsh, noisy environments. The dataset, source code, and trained models will be open-sourced.

Related papers

CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting [48.883658011726915]
Vehicle-to-everything (V2X) communication plays a crucial role in autonomous driving, enabling cooperation between vehicles and infrastructure.<n>In this paper, we introduce CRUISE, a comprehensive reconstruction-and-synthesis framework designed for V2X driving environments.
arXiv Detail & Related papers (2025-07-24T14:48:44Z)
Generate Realistic Test Scenes for V2X Communication Systems [16.608937542327418]
We design and implement V2XGen, an automated testing tool for V2X cooperative perception systems.<n>V2XGen utilizes a high-fidelity approach to generate realistic cooperative object instances and strategically place them within the background data.<n>We conduct experiments on V2XGen using multiple cooperative perception systems with different fusion schemes to assess its performance on various tasks.
arXiv Detail & Related papers (2025-06-09T04:35:53Z)
Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration [56.75198775820637]
Vehicle-to-everything (V2X) collaborative perception has emerged as a promising solution to address the limitations of single-vehicle perception systems. To address these gaps, we present Mixed Signals, a comprehensive V2X dataset featuring 45.1k point clouds and 240.6k bounding boxes. Our dataset provides precisely aligned point clouds and bounding box annotations across 10 classes, ensuring reliable data for perception training.
arXiv Detail & Related papers (2025-02-19T23:53:00Z)
V2X-DGPE: Addressing Domain Gaps and Pose Errors for Robust Collaborative 3D Object Detection [18.694510415777632]
V2X-DGPE is a high-accuracy and robust V2X feature-level collaborative perception framework. The proposed method outperforms existing approaches, achieving state-of-the-art detection performance.
arXiv Detail & Related papers (2025-01-04T19:28:55Z)
LaVin-DiT: Large Vision Diffusion Transformer [99.98106406059333]
LaVin-DiT is a scalable and unified foundation model designed to tackle over 20 computer vision tasks in a generative framework. We introduce key innovations to optimize generative performance for vision tasks. The model is scaled from 0.1B to 3.4B parameters, demonstrating substantial scalability and state-of-the-art performance across diverse vision tasks.
arXiv Detail & Related papers (2024-11-18T12:05:27Z)
CooPre: Cooperative Pretraining for V2X Cooperative Perception [47.00472259100765]
We present a self-supervised learning method for V2X cooperative perception. We utilize the vast amount of unlabeled 3D V2X data to enhance the perception performance.
arXiv Detail & Related papers (2024-08-20T23:39:26Z)
V2X-Real: a Large-Scale Dataset for Vehicle-to-Everything Cooperative Perception [22.3955949838171]
We present V2X-Real, a large-scale dataset that includes a mixture of multiple vehicles and smart infrastructure. Our dataset contains 33K LiDAR frames and 171K camera data with over 1.2M annotated bounding boxes of 10 categories in very challenging urban scenarios.
arXiv Detail & Related papers (2024-03-24T06:30:02Z)
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection [78.09431523221458]
DI-V2X aims to learn Domain-Invariant representations through a new distillation framework. DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module.
arXiv Detail & Related papers (2023-12-25T14:40:46Z)
Learning Cooperative Trajectory Representations for Motion Forecasting [4.380073528690906]
We propose a forecasting-oriented representation paradigm to utilize motion and interaction features from cooperative information. We present V2X-Graph, a representative framework to achieve interpretable and end-to-end trajectory feature fusion for cooperative motion forecasting. To further evaluate on vehicle-to-everything (V2X) scenario, we construct the first real-world V2X motion forecasting dataset V2X-Traj.
arXiv Detail & Related papers (2023-11-01T08:53:05Z)
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer [4.957079586254435]
HM-ViT is the first unified multi-agent hetero-modal cooperative perception framework. It can collaboratively predict 3D objects for highly dynamic vehicle-to-vehicle (V2V) collaborations with varying numbers and types of agents.
arXiv Detail & Related papers (2023-04-20T20:09:59Z)
V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry. We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception. Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z)
CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers [36.838065731893735]
CoBEVT is the first generic multi-agent perception framework that can cooperatively generate BEV map predictions. CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
arXiv Detail & Related papers (2022-07-05T17:59:28Z)
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z)
V2X-Sim: A Virtual Collaborative Perception Dataset for Autonomous Driving [26.961213523096948]
Vehicle-to-everything (V2X) denotes the collaboration between a vehicle and any entity in its surrounding. We present the V2X-Sim dataset, the first public large-scale collaborative perception dataset in autonomous driving.
arXiv Detail & Related papers (2022-02-17T05:14:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.