Related papers: CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection

CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection

URL: http://arxiv.org/abs/2502.14891v2
Date: Thu, 17 Apr 2025 01:47:10 GMT
Title: CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection
Authors: Zhe Huang, Shuo Wang, Yongcai Wang, Lei Wang,
Abstract summary: Collaborative 3D object detection holds significant importance in the field of autonomous driving.<n>Due to pose estimation errors and time delays, the fusion of information across agents often results in feature representations with spatial and temporal noise.<n>We propose CoDiff, a novel robust collaborative perception framework.
Score: 9.28605575548509
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Collaborative 3D object detection holds significant importance in the field of autonomous driving, as it greatly enhances the perception capabilities of each individual agent by facilitating information exchange among multiple agents. However, in practice, due to pose estimation errors and time delays, the fusion of information across agents often results in feature representations with spatial and temporal noise, leading to detection errors. Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to explore the use of diffusion models to address the noise problem between multi-agent systems. In this work, we propose CoDiff, a novel robust collaborative perception framework that leverages the potential of diffusion models to generate more comprehensive and clearer feature representations. To the best of our knowledge, this is the first work to apply diffusion models to multi-agent collaborative perception. Specifically, we project high-dimensional feature map into the latent space of a powerful pre-trained autoencoder. Within this space, individual agent information serves as a condition to guide the diffusion model's sampling. This process denoises coarse feature maps and progressively refines the fused features. Experimental study on both simulated and real-world datasets demonstrates that the proposed framework CoDiff consistently outperforms existing relevant methods in terms of the collaborative object detection performance, and exhibits highly desired robustness when the pose and delay information of agents is with high-level noise. The code is released at https://github.com/HuangZhe885/CoDiff

Related papers

FunDiff: Diffusion Models over Function Spaces for Physics-Informed Generative Modeling [3.6766942024793496]
We introduce FunDiff, a novel framework for generative modeling in function spaces.<n>FunDiff combines a latent diffusion process with a function autoencoder architecture to handle input functions.<n>We demonstrate the practical effectiveness of FunDiff across diverse applications in fluid dynamics and solid mechanics.
arXiv Detail & Related papers (2025-06-09T16:19:59Z)
Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z)
Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection [5.195291754828701]
Collaborative perception allows real-time inter-agent information exchange. limited communication bandwidth in practical scenarios restricts the inter-agent data transmission volume. We propose Which2comm, a novel multi-agent 3D object detection framework leveraging object-level sparse features.
arXiv Detail & Related papers (2025-03-21T14:24:07Z)
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models. We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space. These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z)
Diff3DETR:Agent-based Diffusion Model for Semi-supervised 3D Object Detection [33.58208166717537]
3D object detection is essential for understanding 3D scenes. Recent developments in semi-supervised methods seek to mitigate this problem by employing a teacher-student framework to generate pseudo-labels for unlabeled point clouds. We introduce an Agent-based Diffusion Model for Semi-supervised 3D Object Detection (Diff3DETR)
arXiv Detail & Related papers (2024-08-01T05:04:22Z)
RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks [11.681342476516267]
We propose a Remote Distributed Sensing Foundation Model (RS-DFM) based on generalized information mapping and interaction. This model can realize online collaborative perception across multiple platforms and various downstream tasks. We present a dual-branch information compression module to decouple high-frequency and low-frequency feature information.
arXiv Detail & Related papers (2024-06-11T07:46:47Z)
Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity [10.683635786183894]
CF-Diff is a new diffusion model-based collaborative filtering method. It is capable of making full use of collaborative signals along with multi-hop neighbors. It achieves remarkable gains up to 7.29% compared to the best competitor.
arXiv Detail & Related papers (2024-04-22T14:49:46Z)
Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD. It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets. We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z)
Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs) Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV. Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z)
Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures [12.703947839247693]
Diffusion models, emerging as powerful deep generative tools, excel in various applications. However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories. We present a multi-stage framework inspired by our empirical findings to tackle these challenges.
arXiv Detail & Related papers (2023-12-14T17:48:09Z)
Graph Signal Diffusion Model for Collaborative Filtering [22.727100820178414]
Collaborative filtering is a critical technique in recommender systems. Existing studies on diffusion model lack effective solutions for modeling implicit feedback. We propose a novel Graph Signal Diffusion Model for Collaborative Filtering (named GiffCF)
arXiv Detail & Related papers (2023-11-15T07:25:14Z)
Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance. Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets. In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z)
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z)
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings. We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models. Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z)
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model [22.31860516617302]
We introduce DiffHOI, a novel HOI detection scheme grounded on a pre-trained text-image diffusion model. To fill in the gaps of HOI datasets, we propose SynHOI, a class-balance, large-scale, and high-diversity synthetic dataset. Experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i.e., 41.50 mAP) and zero-shot detection.
arXiv Detail & Related papers (2023-05-20T17:59:23Z)
Informative Data Selection with Uncertainty for Multi-modal Object Detection [25.602915381482468]
We propose a universal uncertainty-aware multi-modal fusion model. Our model reduces the randomness in fusion and generates reliable output. Our fusion model is proven to resist severe noise interference like Gaussian, motion blur, and frost, with only slight degradation.
arXiv Detail & Related papers (2023-04-23T16:36:13Z)
Diffusion Recommender Model [85.9640416600725]
We propose a novel Diffusion Recommender Model (named DiffRec) to learn the generative process in a denoising manner.<n>To retain personalized information in user interactions, DiffRec reduces the added noises and avoids corrupting users' interactions into pure noises like in image synthesis.
arXiv Detail & Related papers (2023-04-11T04:31:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.