V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric
Heterogenous Distillation Network
- URL: http://arxiv.org/abs/2310.06603v1
- Date: Tue, 10 Oct 2023 13:12:03 GMT
- Title: V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric
Heterogenous Distillation Network
- Authors: Caizhen He, Hai Wang, and Long Chen, Tong Luo, and Yingfeng Cai
- Abstract summary: We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD)
The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study.
- Score: 13.248981195106069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object detection is the central issue of intelligent traffic systems, and
recent advancements in single-vehicle lidar-based 3D detection indicate that it
can provide accurate position information for intelligent agents to make
decisions and plan. Compared with single-vehicle perception, multi-view
vehicle-road cooperation perception has fundamental advantages, such as the
elimination of blind spots and a broader range of perception, and has become a
research hotspot. However, the current perception of cooperation focuses on
improving the complexity of fusion while ignoring the fundamental problems
caused by the absence of single-view outlines. We propose a multi-view
vehicle-road cooperation perception system, vehicle-to-everything cooperative
perception (V2X-AHD), in order to enhance the identification capability,
particularly for predicting the vehicle's shape. At first, we propose an
asymmetric heterogeneous distillation network fed with different training data
to improve the accuracy of contour recognition, with multi-view teacher
features transferring to single-view student features. While the point cloud
data are sparse, we propose Spara Pillar, a spare convolutional-based plug-in
feature extraction backbone, to reduce the number of parameters and improve and
enhance feature extraction capabilities. Moreover, we leverage the multi-head
self-attention (MSA) to fuse the single-view feature, and the lightweight
design makes the fusion feature a smooth expression. The results of applying
our algorithm to the massive open dataset V2Xset demonstrate that our method
achieves the state-of-the-art result. The V2X-AHD can effectively improve the
accuracy of 3D object detection and reduce the number of network parameters,
according to this study, which serves as a benchmark for cooperative
perception. The code for this article is available at
https://github.com/feeling0414-lab/V2X-AHD.
Related papers
- UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D
Representation for 3D Perception in Autonomous Driving [51.37470133438836]
Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks.
This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving.
To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, the UniM$2$AE is proposed.
arXiv Detail & Related papers (2023-08-21T02:13:40Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - VINet: Lightweight, Scalable, and Heterogeneous Cooperative Perception
for 3D Object Detection [15.195933965761645]
Cooperative Perception (CP) has emerged to significantly advance the perception of automated driving.
We introduce VINet, a unified deep learning-based CP network for scalable, lightweight, and heterogeneous cooperative 3D object detection.
VINet can reduce 84% system-level computational cost and 94% system-level communication cost while improving the 3D detection accuracy.
arXiv Detail & Related papers (2022-12-14T07:03:23Z) - Self-aligned Spatial Feature Extraction Network for UAV Vehicle
Re-identification [3.449626476434765]
Vehicles with same color and type show extremely similar appearance from the UAV's perspective.
Recent works tend to extract distinguishing information by regional features and component features.
In order to extract efficient fine-grained features and avoid tedious annotating work, this letter develops an unsupervised self-aligned network.
arXiv Detail & Related papers (2022-01-08T14:25:54Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z) - CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional
Auto-Encoder for Interest Point Detection and Feature Description [10.73965992177754]
We propose a fully unsupervised Conal Auto-Encoder based LiDAR Odometry (CAE-LO) that detects interest points from spherical ring data using 2D CAE and extracts features from multi-resolution voxel model using 3D CAE.
We make several key contributions: 1) experiments based on KITTI dataset show that our interest points can capture more local details to improve the matching success rate on unstructured scenarios and our features outperform state-of-the-art by more than 50% in matching inlier ratio.
arXiv Detail & Related papers (2020-01-06T01:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.