What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception
- URL: http://arxiv.org/abs/2403.10068v1
- Date: Fri, 15 Mar 2024 07:18:55 GMT
- Title: What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception
- Authors: Wanfang Su, Lixing Chen, Yang Bai, Xi Lin, Gaolei Li, Zhe Qu, Pan Zhou,
- Abstract summary: Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources.
This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view.
We propose a novel framework named CMiMC for intermediate collaboration.
- Score: 52.41695608928129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources. This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view (i.e., post-collaboration feature) and its underlying relationship to individual views (i.e., pre-collaboration features), which were treated as an opaque procedure by most existing works. We propose a novel framework named CMiMC (Contrastive Mutual Information Maximization for Collaborative Perception) for intermediate collaboration. The core philosophy of CMiMC is to preserve discriminative information of individual views in the collaborative view by maximizing mutual information between pre- and post-collaboration features while enhancing the efficacy of collaborative views by minimizing the loss function of downstream tasks. In particular, we define multi-view mutual information (MVMI) for intermediate collaboration that evaluates correlations between collaborative views and individual views on both global and local scales. We establish CMiMNet based on multi-view contrastive learning to realize estimation and maximization of MVMI, which assists the training of a collaboration encoder for voxel-level feature fusion. We evaluate CMiMC on V2X-Sim 1.0, and it improves the SOTA average precision by 3.08% and 4.44% at 0.5 and 0.7 IoU (Intersection-over-Union) thresholds, respectively. In addition, CMiMC can reduce communication volume to 1/32 while achieving performance comparable to SOTA. Code and Appendix are released at https://github.com/77SWF/CMiMC.
Related papers
- MAPL: Model Agnostic Peer-to-peer Learning [2.9221371172659616]
We introduce Model Agnostic Peer-to-peer Learning (MAPL) to simultaneously learn heterogeneous personalized models and a collaboration graph.
MAPL is comprised of two main modules: (i) local-level Personalized Model Learning (PML), leveraging a combination of intra- and inter-client contrastive losses; (ii) network-wide decentralized Collaborative Graph Learning (CGL) dynamically refining collaboration weights based on local task similarities.
arXiv Detail & Related papers (2024-03-28T19:17:54Z) - DCP-Net: A Distributed Collaborative Perception Network for Remote
Sensing Semantic Segmentation [12.745202593789152]
This article innovatively presents a distributed collaborative perception network called DCP-Net.
DCP-Net helps members to enhance perception performance by integrating features from other platforms.
The results demonstrate that DCP-Net outperforms the existing methods comprehensively.
arXiv Detail & Related papers (2023-09-05T13:36:40Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - Select2Col: Leveraging Spatial-Temporal Importance of Semantic
Information for Efficient Collaborative Perception [21.043094544649733]
Collaborative perception by leveraging the shared semantic information plays a crucial role in overcoming the individual limitations of isolated agents.
Existing collaborative perception methods tend to focus solely on the spatial features of semantic information, while neglecting the importance of the temporal dimension.
We propose Select2Col, a novel collaborative perception framework that takes into account the underlinespatial-tunderlinee of semantiunderlinec informaunderlinetion.
arXiv Detail & Related papers (2023-07-31T09:33:19Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - UMC: A Unified Bandwidth-efficient and Multi-resolution based
Collaborative Perception Framework [20.713675020714835]
We propose a Unified Collaborative perception framework named UMC.
It is designed to optimize the communication, collaboration, and reconstruction processes with the Multi-resolution technique.
Our experiments prove that the proposed UMC greatly outperforms the state-of-the-art collaborative perception approaches.
arXiv Detail & Related papers (2023-03-22T09:09:02Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - COVINS: Visual-Inertial SLAM for Centralized Collaboration [11.65456841016608]
Collaborative SLAM enables a group of agents to simultaneously co-localize and jointly map an environment.
This article presents COVINS, a novel collaborative SLAM system, that enables multi-agent, scalable SLAM in large environments.
arXiv Detail & Related papers (2021-08-12T13:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.