Related papers: SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

URL: http://arxiv.org/abs/2512.06838v1
Date: Sun, 07 Dec 2025 13:22:06 GMT
Title: SparseCoop: Cooperative Perception with Kinematic-Grounded Queries
Authors: Jiahao Wang, Zhongwei Jiang, Wenchao Sun, Jiaru Zhong, Haibao Yu, Yuner Zhang, Chenyang Lu, Chuang Zhang, Lei He, Shaobing Xu, Jianqiang Wang,
Abstract summary: We propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking.<n> Experiments on2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance.
Score: 24.54324085409114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic-grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module for robust fusion; and a cooperative instance denoising task to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this with superior computational efficiency, low transmission cost, and strong robustness to communication latency. Code is available at https://github.com/wang-jh18-SVM/SparseCoop.

Related papers

Bandwidth-adaptive Cloud-Assisted 360-Degree 3D Perception for Autonomous Vehicles [0.7557499794873328]
Key challenge for autonomous driving is maintaining real-time situational awareness regarding surrounding obstacles under strict latency constraints.<n>We propose leveraging Vehicle-to-Everything (V2X) communication to partially offload processing to the cloud.<n>Our approach utilizes transformer-based models to fuse multi-camera sensor data into a comprehensive Bird's-Eye View (BEV) representation.
arXiv Detail & Related papers (2026-02-27T10:12:02Z)
INSTINCT: Instance-Level Interaction Architecture for Query-Based Collaborative Perception [6.018757656052237]
Collaborative perception systems overcome single-vehicle limitations by integrating multi-agent sensory data, improving accuracy and safety.<n>Previous works proves that query-based instance-level interaction reduces bandwidth demands and manual priors, however, LiDAR-focused implementations in collaborative perception remain underdeveloped.<n>We propose INSTINCT, a novel collaborative perception framework featuring three core components: 1) a quality-aware filtering mechanism for high-quality instance feature selection; 2) a dual-branch detection routing scheme to decouple collaboration-irrelevant and collaboration-relevant instances; and 3) a Cross Agent Local Instance Fusion module to aggregate local hybrid instance features.
arXiv Detail & Related papers (2025-09-28T07:16:32Z)
SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
SparseAlign: A Fully Sparse Framework for Cooperative Object Detection [38.96043178218958]
We design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features.<n>Our framework, despite its sparsity, outperforms the state of the art with less communication bandwidth requirements.
arXiv Detail & Related papers (2025-03-17T09:38:53Z)
Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model [63.336123527432136]
We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation.<n>Unlike existing video generative models for autonomous driving, the proposed designs are tailored for interactive simulation.<n>We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-12-11T06:35:18Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection [0.552480439325792]
We propose Time-Aligned COoperative Object Detection (TA-COOD), for which we adapt widely used dataset OPV2V and DairV2X. Experiment results confirm the superior efficiency of our fully sparse framework compared to the state-of-the-art dense models.
arXiv Detail & Related papers (2024-07-04T10:56:10Z)
Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow [45.670727141966545]
Collaborative perception can boost each agent's perception ability by facilitating communication among multiple agents. However, temporal asynchrony among agents is inevitable in the real world due to communication delays, interruptions, and clock misalignments. We propose CoBEVFlow, an asynchrony-robust collaborative perception system based on bird's eye view (BEV) flow.
arXiv Detail & Related papers (2023-09-29T02:45:56Z)
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner. We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.