Select2Col: Leveraging Spatial-Temporal Importance of Semantic
Information for Efficient Collaborative Perception
- URL: http://arxiv.org/abs/2307.16517v3
- Date: Wed, 7 Feb 2024 04:53:54 GMT
- Title: Select2Col: Leveraging Spatial-Temporal Importance of Semantic
Information for Efficient Collaborative Perception
- Authors: Yuntao Liu, Qian Huang, Rongpeng Li, Xianfu Chen, Zhifeng Zhao,
Shuyuan Zhao, Yongdong Zhu and Honggang Zhang
- Abstract summary: Collaborative perception by leveraging the shared semantic information plays a crucial role in overcoming the individual limitations of isolated agents.
Existing collaborative perception methods tend to focus solely on the spatial features of semantic information, while neglecting the importance of the temporal dimension.
We propose Select2Col, a novel collaborative perception framework that takes into account the underlinespatial-tunderlinee of semantiunderlinec informaunderlinetion.
- Score: 21.043094544649733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborative perception by leveraging the shared semantic information plays
a crucial role in overcoming the individual limitations of isolated agents.
However, existing collaborative perception methods tend to focus solely on the
spatial features of semantic information, while neglecting the importance of
the temporal dimension. Consequently, the potential benefits of collaboration
remain underutilized. In this article, we propose Select2Col, a novel
collaborative perception framework that takes into account the
\underline{s}patial-t\underline{e}mpora\underline{l} importanc\underline{e} of
semanti\underline{c} informa\underline{t}ion. Within the Select2Col, we develop
a collaborator selection method that utilizes a lightweight graph neural
network (GNN) to estimate the importance of semantic information (IoSI) of each
collaborator in enhancing perception performance, thereby identifying
contributive collaborators while excluding those that potentially bring
negative impact. Moreover, we present a semantic information fusion algorithm
called HPHA (historical prior hybrid attention), which integrates multi-scale
attention and short-term attention modules to capture the IoSI in feature
representation from the spatial and temporal dimensions respectively, and
assigns IoSI-consistent weights for efficient fusion of information from
selected collaborators. Extensive experiments on three open datasets
demonstrate that our proposed Select2Col significantly improves the perception
performance compared to state-of-the-art approaches. The code associated with
this research is publicly available at https://github.com/huangqzj/Select2Col/.
Related papers
- V2X-PC: Vehicle-to-everything Collaborative Perception via Point Cluster [58.79477191603844]
We introduce a new message unit, namely point cluster, to represent the scene sparsely with a combination of low-level structure information and high-level semantic information.
This framework includes a Point Cluster Packing (PCP) module to keep object feature and manage bandwidth.
Experiments on two widely recognized collaborative perception benchmarks showcase the superior performance of our method compared to the previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-03-25T11:24:02Z) - What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception [52.41695608928129]
Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources.
This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view.
We propose a novel framework named CMiMC for intermediate collaboration.
arXiv Detail & Related papers (2024-03-15T07:18:55Z) - Spatio-Temporal Domain Awareness for Multi-Agent Collaborative
Perception [18.358998861454477]
Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the performance perception of autonomous vehicles over single-agent perception.
We propose SCOPE, a novel collaborative perception framework that aggregates awareness characteristics across agents in an end-to-end manner.
arXiv Detail & Related papers (2023-07-26T03:00:31Z) - Attention Based Feature Fusion For Multi-Agent Collaborative Perception [4.120288148198388]
We propose an intermediate collaborative perception solution in the form of a graph attention network (GAT)
The proposed approach develops an attention-based aggregation strategy to fuse intermediate representations exchanged among multiple connected agents.
This approach adaptively highlights important regions in the intermediate feature maps at both the channel and spatial levels, resulting in improved object detection precision.
arXiv Detail & Related papers (2023-05-03T12:06:11Z) - RLIP: Relational Language-Image Pre-training for Human-Object
Interaction Detection [32.20132357830726]
Language-Image Pre-training (LIPR) is a strategy for contrastive pre-training that leverages both entity and relation descriptions.
We show the benefits of these contributions, collectively termed RLIP-ParSe, for improved zero-shot, few-shot and fine-tuning HOI detection as well as increased robustness from noisy annotations.
arXiv Detail & Related papers (2022-09-05T07:50:54Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z) - Bi-Directional Attention for Joint Instance and Semantic Segmentation in
Point Clouds [9.434847591440485]
We build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception.
It uses similarity matrix measured from features for one task to help aggregate non-local information for the other task.
From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified.
arXiv Detail & Related papers (2020-03-11T17:16:07Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.