BM2CP: Efficient Collaborative Perception with LiDAR-Camera Modalities
- URL: http://arxiv.org/abs/2310.14702v2
- Date: Thu, 7 Dec 2023 04:42:07 GMT
- Title: BM2CP: Efficient Collaborative Perception with LiDAR-Camera Modalities
- Authors: Binyu Zhao, Wei Zhang, Zhaonian Zou
- Abstract summary: We propose a collaborative perception paradigm, BM2CP, which employs LiDAR and camera to achieve efficient multi-modal perception.
It is capable to cope with the special case where one of the sensors, same or different type, of any agent is missing.
Our approach outperforms the state-of-the-art methods with 50X lower communication volumes in both simulated and real-world autonomous driving scenarios.
- Score: 5.034692611033509
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborative perception enables agents to share complementary perceptual
information with nearby agents. This would improve the perception performance
and alleviate the issues of single-view perception, such as occlusion and
sparsity. Most existing approaches mainly focus on single modality (especially
LiDAR), and not fully exploit the superiority of multi-modal perception. We
propose a collaborative perception paradigm, BM2CP, which employs LiDAR and
camera to achieve efficient multi-modal perception. It utilizes LiDAR-guided
modal fusion, cooperative depth generation and modality-guided intermediate
fusion to acquire deep interactions among modalities of different agents,
Moreover, it is capable to cope with the special case where one of the sensors,
same or different type, of any agent is missing. Extensive experiments validate
that our approach outperforms the state-of-the-art methods with 50X lower
communication volumes in both simulated and real-world autonomous driving
scenarios. Our code is available at https://github.com/byzhaoAI/BM2CP.
Related papers
- Semantic Communication for Cooperative Perception using HARQ [51.148203799109304]
We leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework.
To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies.
We introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ)
arXiv Detail & Related papers (2024-08-29T08:53:26Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning [9.25057318925143]
We present a novel multi-agent RL approach, in which agents share with other agents a limited number of transitions they observe during training.
We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms.
arXiv Detail & Related papers (2023-11-01T21:35:32Z) - MACP: Efficient Model Adaptation for Cooperative Perception [23.308578463976804]
We propose a new framework termed MACP, which equips a single-agent pre-trained model with cooperation capabilities.
We demonstrate in experiments that the proposed framework can effectively utilize cooperative observations and outperform other state-of-the-art approaches.
arXiv Detail & Related papers (2023-10-25T14:24:42Z) - Egocentric RGB+Depth Action Recognition in Industry-Like Settings [50.38638300332429]
Our work focuses on recognizing actions from egocentric RGB and Depth modalities in an industry-like environment.
Our framework is based on the 3D Video SWIN Transformer to encode both RGB and Depth modalities effectively.
Our method also secured first place at the multimodal action recognition challenge at ICIAP 2023.
arXiv Detail & Related papers (2023-09-25T08:56:22Z) - Practical Collaborative Perception: A Framework for Asynchronous and
Multi-Agent 3D Object Detection [9.967263440745432]
Occlusion is a major challenge for LiDAR-based object detection methods.
State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach.
We devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior methods.
arXiv Detail & Related papers (2023-07-04T03:49:42Z) - Collaborative Visual Navigation [69.20264563368762]
We propose a large-scale 3D dataset, CollaVN, for multi-agent visual navigation (MAVN)
Diverse MAVN variants are explored to make our problem more general.
A memory-augmented communication framework is proposed. Each agent is equipped with a private, external memory to persistently store communication information.
arXiv Detail & Related papers (2021-07-02T15:48:16Z) - DIRV: Dense Interaction Region Voting for End-to-End Human-Object
Interaction Detection [53.40028068801092]
We propose a novel one-stage HOI detection approach based on a new concept called interaction region for the HOI problem.
Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair.
In order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy.
arXiv Detail & Related papers (2020-10-02T13:57:58Z) - A Visual Communication Map for Multi-Agent Deep Reinforcement Learning [7.003240657279981]
Multi-agent learning poses significant challenges in the effort to allocate a concealed communication medium.
Recent studies typically combine a specialized neural network with reinforcement learning to enable communication between agents.
This paper proposes a more scalable approach that not only deals with a great number of agents but also enables collaboration between dissimilar functional agents.
arXiv Detail & Related papers (2020-02-27T02:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.