CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query
- URL: http://arxiv.org/abs/2502.19313v1
- Date: Wed, 26 Feb 2025 17:09:51 GMT
- Title: CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query
- Authors: Zhe Wang, Shaocong Xu, Xucai Zhuang, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang,
- Abstract summary: CoopDETR is a novel cooperative perception framework that introduces object-level feature cooperation via object query.<n>Our framework consists of two key modules: single-agent query generation, which efficiently encodes raw sensor data into object queries, and cross-agent query fusion.<n>Experiments on the OPV2V and V2XSet datasets demonstrate that CoopDETR achieves state-of-the-art performance and significantly reduces transmission costs to 1/782 of previous methods.
- Score: 21.010741892266136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative perception enhances the individual perception capabilities of autonomous vehicles (AVs) by providing a comprehensive view of the environment. However, balancing perception performance and transmission costs remains a significant challenge. Current approaches that transmit region-level features across agents are limited in interpretability and demand substantial bandwidth, making them unsuitable for practical applications. In this work, we propose CoopDETR, a novel cooperative perception framework that introduces object-level feature cooperation via object query. Our framework consists of two key modules: single-agent query generation, which efficiently encodes raw sensor data into object queries, reducing transmission cost while preserving essential information for detection; and cross-agent query fusion, which includes Spatial Query Matching (SQM) and Object Query Aggregation (OQA) to enable effective interaction between queries. Our experiments on the OPV2V and V2XSet datasets demonstrate that CoopDETR achieves state-of-the-art performance and significantly reduces transmission costs to 1/782 of previous methods.
Related papers
- Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection [5.195291754828701]
Collaborative perception allows real-time inter-agent information exchange.
limited communication bandwidth in practical scenarios restricts the inter-agent data transmission volume.
We propose Which2comm, a novel multi-agent 3D object detection framework leveraging object-level sparse features.
arXiv Detail & Related papers (2025-03-21T14:24:07Z) - CoCMT: Communication-Efficient Cross-Modal Transformer for Collaborative Perception [14.619784179608361]
Multi-agent collaborative perception enhances each agent's capabilities by sharing sensing information to cooperatively perform robot perception tasks.
Existing representative collaborative perception systems transmit intermediate feature maps, which contain significant amount of non-critical information.
We introduce CoCMT, an object-query-based collaboration framework that maximizes communication bandwidth by selectively extracting and transmitting essential features.
arXiv Detail & Related papers (2025-03-13T06:41:25Z) - Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.<n>We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - RoCo:Robust Collaborative Perception By Iterative Object Matching and Pose Adjustment [9.817492112784674]
Collaborative autonomous driving with multiple vehicles usually requires the data fusion from multiple modalities.
In collaborative perception, the quality of object detection based on a modality is highly sensitive to the relative pose errors among the agents.
We propose RoCo, a novel unsupervised framework to conduct iterative object matching and agent pose adjustment.
arXiv Detail & Related papers (2024-08-01T03:29:33Z) - Practical Collaborative Perception: A Framework for Asynchronous and
Multi-Agent 3D Object Detection [9.967263440745432]
Occlusion is a major challenge for LiDAR-based object detection methods.
State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach.
We devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior methods.
arXiv Detail & Related papers (2023-07-04T03:49:42Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Bandwidth-Adaptive Feature Sharing for Cooperative LIDAR Object
Detection [2.064612766965483]
Situational awareness as a necessity in the connected and autonomous vehicles (CAV) domain.
Cooperative mechanisms have provided a solution to improve situational awareness by utilizing high speed wireless vehicular networks.
We propose a mechanism to add flexibility in adapting to communication channel capacity and a novel decentralized shared data alignment method.
arXiv Detail & Related papers (2020-10-22T00:12:58Z) - A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [61.109486326954205]
Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system.
Previous studies either model the two tasks separately or only consider the single information flow from intent to slot.
We propose a Co-Interactive Transformer to consider the cross-impact between the two tasks simultaneously.
arXiv Detail & Related papers (2020-10-08T10:16:52Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.