HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with
vision transformer
- URL: http://arxiv.org/abs/2304.10628v1
- Date: Thu, 20 Apr 2023 20:09:59 GMT
- Title: HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with
vision transformer
- Authors: Hao Xiang, Runsheng Xu, Jiaqi Ma
- Abstract summary: HM-ViT is the first unified multi-agent hetero-modal cooperative perception framework.
It can collaboratively predict 3D objects for highly dynamic vehicle-to-vehicle (V2V) collaborations with varying numbers and types of agents.
- Score: 4.957079586254435
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vehicle-to-Vehicle technologies have enabled autonomous vehicles to share
information to see through occlusions, greatly enhancing perception
performance. Nevertheless, existing works all focused on homogeneous traffic
where vehicles are equipped with the same type of sensors, which significantly
hampers the scale of collaboration and benefit of cross-modality interactions.
In this paper, we investigate the multi-agent hetero-modal cooperative
perception problem where agents may have distinct sensor modalities. We present
HM-ViT, the first unified multi-agent hetero-modal cooperative perception
framework that can collaboratively predict 3D objects for highly dynamic
vehicle-to-vehicle (V2V) collaborations with varying numbers and types of
agents. To effectively fuse features from multi-view images and LiDAR point
clouds, we design a novel heterogeneous 3D graph transformer to jointly reason
inter-agent and intra-agent interactions. The extensive experiments on the V2V
perception dataset OPV2V demonstrate that the HM-ViT outperforms SOTA
cooperative perception methods for V2V hetero-modal cooperative perception. We
will release codes to facilitate future research.
Related papers
- Hybrid-Generative Diffusion Models for Attack-Oriented Twin Migration in Vehicular Metaverses [58.264499654343226]
Vehicle Twins (VTs) are digital twins that provide immersive virtual services for Vehicular Metaverse Users (VMUs)
High mobility of vehicles, uneven deployment of edge servers, and potential security threats pose challenges to achieving efficient and reliable VT migrations.
We propose a secure and reliable VT migration framework in vehicular metaverses.
arXiv Detail & Related papers (2024-07-05T11:11:33Z) - SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles [18.23919432049492]
Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles.
This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models.
We present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones.
arXiv Detail & Related papers (2023-12-08T04:12:26Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle
Cooperative Perception [49.7212681947463]
Vehicle-to-Vehicle (V2V) cooperative perception system has great potential to revolutionize the autonomous driving industry.
We present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception.
Our dataset covers a driving area of 410 km, comprising 20K LiDAR frames, 40K RGB frames, 240K annotated 3D bounding boxes for 5 classes, and HDMaps.
arXiv Detail & Related papers (2023-03-14T02:49:20Z) - Learning for Vehicle-to-Vehicle Cooperative Perception under Lossy
Communication [30.100647849646467]
We study the side effect (e.g., detection performance drop) by the lossy communication in the V2V Cooperative Perception.
We propose a novel intermediate LC-aware feature fusion method to relieve the side effect of lossy communication.
The proposed method is quite effective for the cooperative point cloud based 3D object detection under lossy V2V communication.
arXiv Detail & Related papers (2022-12-16T04:18:47Z) - CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse
Transformers [36.838065731893735]
CoBEVT is the first generic multi-agent perception framework that can cooperatively generate BEV map predictions.
CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
arXiv Detail & Related papers (2022-07-05T17:59:28Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z) - V2X-Sim: A Virtual Collaborative Perception Dataset for Autonomous
Driving [26.961213523096948]
Vehicle-to-everything (V2X) denotes the collaboration between a vehicle and any entity in its surrounding.
We present the V2X-Sim dataset, the first public large-scale collaborative perception dataset in autonomous driving.
arXiv Detail & Related papers (2022-02-17T05:14:02Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.