Related papers: DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits

DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits

URL: http://arxiv.org/abs/2509.24903v1
Date: Mon, 29 Sep 2025 15:13:03 GMT
Title: DRCP: Diffusion on Reinforced Cooperative Perception for Perceiving Beyond Limits
Authors: Lantao Li, Kang Yang, Rui Song, Chen Sun,
Abstract summary: Diffusion on Reinforced Cooperative Perception (DRCP) is a real-time deployable framework designed to address issues in dynamic driving environments.<n>The proposed system achieves real-time performance on mobile platforms while significantly improving robustness under challenging conditions.
Score: 11.34052678290095
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cooperative perception enabled by Vehicle-to-Everything communication has shown great promise in enhancing situational awareness for autonomous vehicles and other mobile robotic platforms. Despite recent advances in perception backbones and multi-agent fusion, real-world deployments remain challenged by hard detection cases, exemplified by partial detections and noise accumulation which limit downstream detection accuracy. This work presents Diffusion on Reinforced Cooperative Perception (DRCP), a real-time deployable framework designed to address aforementioned issues in dynamic driving environments. DRCP integrates two key components: (1) Precise-Pyramid-Cross-Modality-Cross-Agent, a cross-modal cooperative perception module that leverages camera-intrinsic-aware angular partitioning for attention-based fusion and adaptive convolution to better exploit external features; and (2) Mask-Diffusion-Mask-Aggregation, a novel lightweight diffusion-based refinement module that encourages robustness against feature perturbations and aligns bird's-eye-view features closer to the task-optimal manifold. The proposed system achieves real-time performance on mobile platforms while significantly improving robustness under challenging conditions. Code will be released in late 2025.

Related papers

DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving [47.573692944838115]
DriveMamba is a Task-Centric Scalable paradigm for efficient E2E-AD.<n>It integrates sequential task relation modeling, implicit correspondence learning and long-term temporal fusion into a single-stage Unified Mamba decoder.<n>Extensive experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superiority, generalizability and great efficiency of DriveMamba.
arXiv Detail & Related papers (2026-02-09T11:48:29Z)
Attention in Motion: Secure Platooning via Transformer-based Misbehavior Detection [0.6999740786886536]
Vehicular platooning promises transformative improvements in transportation efficiency and safety through the coordination of multi-vehicle formations.<n>Traditional misbehaviour detection approaches, which rely on plausibility checks and statistical methods, suffer from high False Positive (FP) rates.<n>We present Attention In Motion (AIMformer), a transformer-based framework specifically tailored for real-time misbehaviour detection in vehicular platoons.
arXiv Detail & Related papers (2025-12-17T14:45:33Z)
FocalComm: Hard Instance-Aware Multi-Agent Perception [0.0]
FocalComm is a novel collaborative perception framework that focuses on exchanging hard-instance-oriented features.<n>We show that FocalComm outperforms state-of-the-art collaborative perception methods on two challenging real-world datasets.
arXiv Detail & Related papers (2025-12-16T00:41:50Z)
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z)
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z)
V2X-DGPE: Addressing Domain Gaps and Pose Errors for Robust Collaborative 3D Object Detection [18.694510415777632]
V2X-DGPE is a high-accuracy and robust V2X feature-level collaborative perception framework.<n>The proposed method outperforms existing approaches, achieving state-of-the-art detection performance.
arXiv Detail & Related papers (2025-01-04T19:28:55Z)
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes.<n>Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token.<n>Our model significantly improves robustness and accuracy, especially in adverse-condition scenarios.
arXiv Detail & Related papers (2024-10-14T17:56:20Z)
CoMamba: Real-time Cooperative Perception Unlocked with State Space Models [39.87600356189242]
CoMamba is a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception. CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities.
arXiv Detail & Related papers (2024-09-16T20:02:19Z)
Semantic Communication for Cooperative Perception using HARQ [51.148203799109304]
We leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework. To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies. We introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ)
arXiv Detail & Related papers (2024-08-29T08:53:26Z)
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z)
Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection [9.967263440745432]
Occlusion is a major challenge for LiDAR-based object detection methods. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach. We devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior methods.
arXiv Detail & Related papers (2023-07-04T03:49:42Z)
Learning to Communicate and Correct Pose Errors [75.03747122616605]
We study the setting proposed in V2VNet, where nearby self-driving vehicles jointly perform object detection and motion forecasting in a cooperative manner. We propose a novel neural reasoning framework that learns to communicate, to estimate potential errors, and to reach a consensus about those errors.
arXiv Detail & Related papers (2020-11-10T18:19:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.