Related papers: UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection

UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection

URL: http://arxiv.org/abs/2406.04647v1
Date: Fri, 7 Jun 2024 05:25:45 GMT
Title: UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection
Authors: Yuchao Wang, Peirui Cheng, Pengju Tian, Ziyang Yuan, Liangjin Zhao, Jing Tian, Wensheng Wang, Zhirui Wang, Xian Sun,
Abstract summary: We propose a framework specifically designed for aerial-ground collaboration. We develop a virtual dataset named V2U-COO for our research. Second, we design a Cross-Domain Cross-Adaptation (CDCA) module to align the target information. Third, we introduce a Collaborative Depth Optimization (CDO) module to obtain more precise depth estimation results.
Score: 11.60579201022641
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the advancement of collaborative perception, the role of aerial-ground collaborative perception, a crucial component, is becoming increasingly important. The demand for collaborative perception across different perspectives to construct more comprehensive perceptual information is growing. However, challenges arise due to the disparities in the field of view between cross-domain agents and their varying sensitivity to information in images. Additionally, when we transform image features into Bird's Eye View (BEV) features for collaboration, we need accurate depth information. To address these issues, we propose a framework specifically designed for aerial-ground collaboration. First, to mitigate the lack of datasets for aerial-ground collaboration, we develop a virtual dataset named V2U-COO for our research. Second, we design a Cross-Domain Cross-Adaptation (CDCA) module to align the target information obtained from different domains, thereby achieving more accurate perception results. Finally, we introduce a Collaborative Depth Optimization (CDO) module to obtain more precise depth estimation results, leading to more accurate perception outcomes. We conduct extensive experiments on both our virtual dataset and a public dataset to validate the effectiveness of our framework. Our experiments on the V2U-COO dataset and the DAIR-V2X dataset demonstrate that our method improves detection accuracy by 6.1% and 2.7%, respectively.

Related papers

CoBEVMoE: Heterogeneity-aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception [21.27356211403264]
We propose a novel collaborative perception framework that operates in the Bird's Eye View (BEV) space.<n>We introduce a Dynamic Expert Metric Loss (DEML) to enhance inter-expert diversity and improve the discriminability of the fused representation.
arXiv Detail & Related papers (2025-09-21T14:56:05Z)
BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection [56.477525075806966]
Vision-centric Bird's Eye View (BEV) perception holds considerable promise for autonomous driving.<n>Recent studies have prioritized efficiency or accuracy enhancements, yet the issue of domain shift has been overlooked.<n>We introduce an innovative geometric-aware teacher-student framework, BEVUDA++, to diminish this issue.
arXiv Detail & Related papers (2025-09-17T16:31:40Z)
V2X-DG: Domain Generalization for Vehicle-to-Everything Cooperative Perception [34.97091536254836]
This paper is the first work to study the Domain Generalization problem of LiDAR-based V2X cooperative perception. Our research seeks to sustain high performance not only within the source domain but also across other unseen domains.
arXiv Detail & Related papers (2025-03-19T17:17:44Z)
Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption [65.06388526722186]
Infrared-visible image fusion is a critical task in computer vision. There is a lack of recent comprehensive surveys that address this rapidly expanding domain. We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods.
arXiv Detail & Related papers (2025-01-18T13:17:34Z)
How Important are Data Augmentations to Close the Domain Gap for Object Detection in Orbit? [15.550663626482903]
We investigate the efficacy of data augmentations to close the domain gap in spaceborne computer vision. We propose two novel data augmentations specifically developed to emulate the visual effects observed in orbital imagery.
arXiv Detail & Related papers (2024-10-21T08:24:46Z)
CooPre: Cooperative Pretraining for V2X Cooperative Perception [47.00472259100765]
CooPre is a self-supervised learning framwork for V2X cooperative perception.<n>We develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents.<n>CooPre achieves a 4% mAP improvement on V2X-Real dataset and surpasses baseline performance using only 50% of the training data.
arXiv Detail & Related papers (2024-08-20T23:39:26Z)
IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception [9.117534139771738]
Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving. Current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This work proposes an instance-level fusion transformer for visual collaborative perception.
arXiv Detail & Related papers (2024-07-13T11:38:15Z)
UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping [14.401624713578737]
Multi-UAV collaborative 3D object detection can perceive and comprehend complex environments. We propose an unparalleled camera-based multi-UAV collaborative 3D object detection paradigm called UCDNet. We show our method increases 4.7% and 10% mAP respectively compared to the baseline.
arXiv Detail & Related papers (2024-06-07T05:27:32Z)
CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments [8.177157078744571]
This paper presents a pioneering and comprehensive real-world multi-robot collaborative perception dataset. It features raw sensor inputs, pose estimation, and optional high-level perception annotation. We believe this work will unlock the potential research of high-level scene understanding through multi-modal collaborative perception in multi-robot settings.
arXiv Detail & Related papers (2024-05-23T15:59:48Z)
Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture. Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection. Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z)
V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric Heterogenous Distillation Network [13.248981195106069]
We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD) The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study.
arXiv Detail & Related papers (2023-10-10T13:12:03Z)
Collaboration Helps Camera Overtake LiDAR in 3D Detection [49.58433319402405]
Camera-only 3D detection provides a simple solution for localizing objects in 3D space compared to LiDAR-based detection systems. Our proposed collaborative camera-only 3D detection (CoCa3D) enables agents to share complementary information with each other through communication. Results show that CoCa3D improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+, 12.59% on CoPerception-UAVs+ for AP@70.
arXiv Detail & Related papers (2023-03-23T03:50:41Z)
Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models. Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)
Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images. In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.