Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow
- URL: http://arxiv.org/abs/2309.16940v2
- Date: Mon, 9 Oct 2023 03:46:41 GMT
- Title: Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow
- Authors: Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Siheng Chen, Ya
Zhang
- Abstract summary: Collaborative perception can boost each agent's perception ability by facilitating communication among multiple agents.
However, temporal asynchrony among agents is inevitable in the real world due to communication delays, interruptions, and clock misalignments.
We propose CoBEVFlow, an asynchrony-robust collaborative perception system based on bird's eye view (BEV) flow.
- Score: 45.670727141966545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborative perception can substantially boost each agent's perception
ability by facilitating communication among multiple agents. However, temporal
asynchrony among agents is inevitable in the real world due to communication
delays, interruptions, and clock misalignments. This issue causes information
mismatch during multi-agent fusion, seriously shaking the foundation of
collaboration. To address this issue, we propose CoBEVFlow, an
asynchrony-robust collaborative perception system based on bird's eye view
(BEV) flow. The key intuition of CoBEVFlow is to compensate motions to align
asynchronous collaboration messages sent by multiple agents. To model the
motion in a scene, we propose BEV flow, which is a collection of the motion
vector corresponding to each spatial location. Based on BEV flow, asynchronous
perceptual features can be reassigned to appropriate positions, mitigating the
impact of asynchrony. CoBEVFlow has two advantages: (i) CoBEVFlow can handle
asynchronous collaboration messages sent at irregular, continuous time stamps
without discretization; and (ii) with BEV flow, CoBEVFlow only transports the
original perceptual features, instead of generating new perceptual features,
avoiding additional noises. To validate CoBEVFlow's efficacy, we create
IRregular V2V(IRV2V), the first synthetic collaborative perception dataset with
various temporal asynchronies that simulate different real-world scenarios.
Extensive experiments conducted on both IRV2V and the real-world dataset
DAIR-V2X show that CoBEVFlow consistently outperforms other baselines and is
robust in extremely asynchronous settings. The code is available at
https://github.com/MediaBrain-SJTU/CoBEVFlow.
Related papers
- CoDynTrust: Robust Asynchronous Collaborative Perception via Dynamic Feature Trust Modulus [9.552300496606644]
Collaborative perception, fusing information from multiple agents, can extend perception range so as to improve performance.
temporal asynchrony in real-world environments, caused by communication delays, clock misalignment, or sampling configuration differences, can lead to information mismatches.
We propose CoDynTrust, an uncertainty-encoded asynchronous fusion perception framework that is robust to the information mismatches caused by temporal asynchrony.
arXiv Detail & Related papers (2025-02-12T07:23:26Z) - Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model [63.336123527432136]
We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation.
Unlike existing video generative models for autonomous driving, the proposed designs are tailored for interactive simulation.
We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-12-11T06:35:18Z) - BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training [5.7294516069851475]
BitPipe is a bidirectional interleaved pipeline parallelism for accelerating large models training.
We show that BitPipe improves the training throughput of GPT-style and BERT-style models by 1.05x-1.28x compared to the state-of-the-art synchronous approaches.
arXiv Detail & Related papers (2024-10-25T08:08:51Z) - AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising [49.785626309848276]
AsyncDiff is a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices.
For the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score.
Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances.
arXiv Detail & Related papers (2024-06-11T03:09:37Z) - Communication-Efficient Collaborative Perception via Information Filling with Codebook [48.087934650038044]
Collaborative perception empowers each agent to improve its perceptual ability through the exchange of perceptual messages with other agents.
To address this bottleneck issue, our core idea is to optimize the collaborative messages from two key aspects: representation and selection.
By integrating these two designs, we propose CodeFilling, a novel communication-efficient collaborative perception system.
arXiv Detail & Related papers (2024-05-08T11:12:37Z) - Unlocking Past Information: Temporal Embeddings in Cooperative Bird's
Eye View Prediction [34.68695222573004]
This paper introduces TempCoBEV, a temporal module designed to incorporate historical cues into current observations.
We show the efficacy of TempCoBEV and its capability to integrate historical cues into the current BEV map, improving predictions under optimal communication conditions by up to 2% and under communication failures by up to 19%.
arXiv Detail & Related papers (2024-01-25T17:21:35Z) - StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation [15.441175735210791]
StreamingFlow is a novel BEV occupancy predictor that ingests asynchronous multi-sensor data streams for fusion.
It learns derivatives of BEV features over temporal horizons, updates the implicit sensor's BEV features as part of the fusion process, and propagates BEV states to the desired future time point.
It significantly outperforms previous vision-based, LiDAR-based methods, and shows superior performance compared to state-of-the-art fusion-based methods.
arXiv Detail & Related papers (2023-02-19T14:38:01Z) - AFAFed -- Protocol analysis [3.016628653955123]
This is a novel A Fair Federated Adaptive learning framework for stream-oriented IoT application environments.
We analyze the convergence properties and address the implementation aspects AFAFed.
arXiv Detail & Related papers (2022-06-29T22:12:08Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Blockchain-enabled Server-less Federated Learning [5.065631761462706]
We focus on an asynchronous server-less Federated Learning solution empowered by (BC) technology.
In contrast to mostly adopted FL approaches, we advocate an asynchronous method whereby model aggregation is done as clients submit their local updates.
arXiv Detail & Related papers (2021-12-15T07:41:23Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.