DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits
- URL: http://arxiv.org/abs/2509.24903v1
- Date: Mon, 29 Sep 2025 15:13:03 GMT
- Title: DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits
- Authors: Lantao Li, Kang Yang, Rui Song, Chen Sun,
- Abstract summary: Diffusion on Reinforced Cooperative Perception (DRCP) is a real-time deployable framework designed to address issues in dynamic driving environments.<n>The proposed system achieves real-time performance on mobile platforms while significantly improving robustness under challenging conditions.
- Score: 11.34052678290095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative perception enabled by Vehicle-to-Everything communication has shown great promise in enhancing situational awareness for autonomous vehicles and other mobile robotic platforms. Despite recent advances in perception backbones and multi-agent fusion, real-world deployments remain challenged by hard detection cases, exemplified by partial detections and noise accumulation which limit downstream detection accuracy. This work presents Diffusion on Reinforced Cooperative Perception (DRCP), a real-time deployable framework designed to address aforementioned issues in dynamic driving environments. DRCP integrates two key components: (1) Precise-Pyramid-Cross-Modality-Cross-Agent, a cross-modal cooperative perception module that leverages camera-intrinsic-aware angular partitioning for attention-based fusion and adaptive convolution to better exploit external features; and (2) Mask-Diffusion-Mask-Aggregation, a novel lightweight diffusion-based refinement module that encourages robustness against feature perturbations and aligns bird's-eye-view features closer to the task-optimal manifold. The proposed system achieves real-time performance on mobile platforms while significantly improving robustness under challenging conditions. Code will be released in late 2025.
Related papers
- DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving [47.573692944838115]
DriveMamba is a Task-Centric Scalable paradigm for efficient E2E-AD.<n>It integrates sequential task relation modeling, implicit correspondence learning and long-term temporal fusion into a single-stage Unified Mamba decoder.<n>Extensive experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superiority, generalizability and great efficiency of DriveMamba.
arXiv Detail & Related papers (2026-02-09T11:48:29Z) - Attention in Motion: Secure Platooning via Transformer-based Misbehavior Detection [0.6999740786886536]
Vehicular platooning promises transformative improvements in transportation efficiency and safety through the coordination of multi-vehicle formations.<n>Traditional misbehaviour detection approaches, which rely on plausibility checks and statistical methods, suffer from high False Positive (FP) rates.<n>We present Attention In Motion (AIMformer), a transformer-based framework specifically tailored for real-time misbehaviour detection in vehicular platoons.
arXiv Detail & Related papers (2025-12-17T14:45:33Z) - FocalComm: Hard Instance-Aware Multi-Agent Perception [0.0]
FocalComm is a novel collaborative perception framework that focuses on exchanging hard-instance-oriented features.<n>We show that FocalComm outperforms state-of-the-art collaborative perception methods on two challenging real-world datasets.
arXiv Detail & Related papers (2025-12-16T00:41:50Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - V2X-DGPE: Addressing Domain Gaps and Pose Errors for Robust Collaborative 3D Object Detection [18.694510415777632]
V2X-DGPE is a high-accuracy and robust V2X feature-level collaborative perception framework.<n>The proposed method outperforms existing approaches, achieving state-of-the-art detection performance.
arXiv Detail & Related papers (2025-01-04T19:28:55Z) - CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.<n>Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token.<n>Our model significantly improves robustness and accuracy, especially in adverse-condition scenarios.
arXiv Detail & Related papers (2024-10-14T17:56:20Z) - CoMamba: Real-time Cooperative Perception Unlocked with State Space Models [39.87600356189242]
CoMamba is a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception.
CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities.
arXiv Detail & Related papers (2024-09-16T20:02:19Z) - Semantic Communication for Cooperative Perception using HARQ [51.148203799109304]
We leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework.
To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies.
We introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ)
arXiv Detail & Related papers (2024-08-29T08:53:26Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - Practical Collaborative Perception: A Framework for Asynchronous and
Multi-Agent 3D Object Detection [9.967263440745432]
Occlusion is a major challenge for LiDAR-based object detection methods.
State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach.
We devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior methods.
arXiv Detail & Related papers (2023-07-04T03:49:42Z) - Learning to Communicate and Correct Pose Errors [75.03747122616605]
We study the setting proposed in V2VNet, where nearby self-driving vehicles jointly perform object detection and motion forecasting in a cooperative manner.
We propose a novel neural reasoning framework that learns to communicate, to estimate potential errors, and to reach a consensus about those errors.
arXiv Detail & Related papers (2020-11-10T18:19:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.