V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric
Heterogenous Distillation Network
- URL: http://arxiv.org/abs/2310.06603v1
- Date: Tue, 10 Oct 2023 13:12:03 GMT
- Title: V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric
Heterogenous Distillation Network
- Authors: Caizhen He, Hai Wang, and Long Chen, Tong Luo, and Yingfeng Cai
- Abstract summary: We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD)
The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study.
- Score: 13.248981195106069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object detection is the central issue of intelligent traffic systems, and
recent advancements in single-vehicle lidar-based 3D detection indicate that it
can provide accurate position information for intelligent agents to make
decisions and plan. Compared with single-vehicle perception, multi-view
vehicle-road cooperation perception has fundamental advantages, such as the
elimination of blind spots and a broader range of perception, and has become a
research hotspot. However, the current perception of cooperation focuses on
improving the complexity of fusion while ignoring the fundamental problems
caused by the absence of single-view outlines. We propose a multi-view
vehicle-road cooperation perception system, vehicle-to-everything cooperative
perception (V2X-AHD), in order to enhance the identification capability,
particularly for predicting the vehicle's shape. At first, we propose an
asymmetric heterogeneous distillation network fed with different training data
to improve the accuracy of contour recognition, with multi-view teacher
features transferring to single-view student features. While the point cloud
data are sparse, we propose Spara Pillar, a spare convolutional-based plug-in
feature extraction backbone, to reduce the number of parameters and improve and
enhance feature extraction capabilities. Moreover, we leverage the multi-head
self-attention (MSA) to fuse the single-view feature, and the lightweight
design makes the fusion feature a smooth expression. The results of applying
our algorithm to the massive open dataset V2Xset demonstrate that our method
achieves the state-of-the-art result. The V2X-AHD can effectively improve the
accuracy of 3D object detection and reduce the number of network parameters,
according to this study, which serves as a benchmark for cooperative
perception. The code for this article is available at
https://github.com/feeling0414-lab/V2X-AHD.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles [18.23919432049492]
Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles.
This drawback impedes the adoption of cooperative perception as vehicle resources are often insufficient to concurrently employ two perception models.
We present Simultaneous Individual and Cooperative Perception (SiCP), a generic framework that supports a wide range of the state-of-the-art standalone perception backbones.
arXiv Detail & Related papers (2023-12-08T04:12:26Z) - UniTR: A Unified and Efficient Multi-Modal Transformer for
Bird's-Eye-View Representation [113.35352122662752]
We present an efficient multi-modal backbone for outdoor 3D perception named UniTR.
UniTR processes a variety of modalities with unified modeling and shared parameters.
UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks.
arXiv Detail & Related papers (2023-08-15T12:13:44Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - VINet: Lightweight, Scalable, and Heterogeneous Cooperative Perception
for 3D Object Detection [15.195933965761645]
Cooperative Perception (CP) has emerged to significantly advance the perception of automated driving.
We introduce VINet, a unified deep learning-based CP network for scalable, lightweight, and heterogeneous cooperative 3D object detection.
VINet can reduce 84% system-level computational cost and 94% system-level communication cost while improving the 3D detection accuracy.
arXiv Detail & Related papers (2022-12-14T07:03:23Z) - Self-aligned Spatial Feature Extraction Network for UAV Vehicle
Re-identification [3.449626476434765]
Vehicles with same color and type show extremely similar appearance from the UAV's perspective.
Recent works tend to extract distinguishing information by regional features and component features.
In order to extract efficient fine-grained features and avoid tedious annotating work, this letter develops an unsupervised self-aligned network.
arXiv Detail & Related papers (2022-01-08T14:25:54Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.