3DifFusionDet: Diffusion Model for 3D Object Detection with Robust
  LiDAR-Camera Fusion
        - URL: http://arxiv.org/abs/2311.03742v1
- Date: Tue, 7 Nov 2023 05:53:09 GMT
- Title: 3DifFusionDet: Diffusion Model for 3D Object Detection with Robust
  LiDAR-Camera Fusion
- Authors: Xinhao Xiang, Simon Dr\"ager, Jiawei Zhang
- Abstract summary: 3DifFusionDet structures 3D object detection as a denoising diffusion process from noisy 3D boxes to target boxes.
Under the feature align strategy, the progressive refinement method could make a significant contribution to robust LiDAR-Camera fusion.
Experiments on KITTI, a benchmark for real-world traffic object identification, revealed that 3DifFusionDet is able to perform favorably in comparison to earlier, well-respected detectors.
- Score: 6.914463996768285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Good 3D object detection performance from LiDAR-Camera sensors demands
seamless feature alignment and fusion strategies. We propose the 3DifFusionDet
framework in this paper, which structures 3D object detection as a denoising
diffusion process from noisy 3D boxes to target boxes. In this framework,
ground truth boxes diffuse in a random distribution for training, and the model
learns to reverse the noising process. During inference, the model gradually
refines a set of boxes that were generated at random to the outcomes. Under the
feature align strategy, the progressive refinement method could make a
significant contribution to robust LiDAR-Camera fusion. The iterative
refinement process could also demonstrate great adaptability by applying the
framework to various detecting circumstances where varying levels of accuracy
and speed are required. Extensive experiments on KITTI, a benchmark for
real-world traffic object identification, revealed that 3DifFusionDet is able
to perform favorably in comparison to earlier, well-respected detectors.
 
      
        Related papers
        - Efficient Multimodal 3D Object Detector via Instance-Level Contrastive   Distillation [17.634678949648208]
 We introduce a fast yet effective multimodal 3D object detector, incorporating our proposed Instance-level Contrastive Distillation (ICD) framework and Cross Linear Attention Fusion Module (CLFM)
Our 3D object detector outperforms state-of-the-art (SOTA) methods while achieving superior efficiency.
 arXiv  Detail & Related papers  (2025-03-17T08:26:11Z)
- DriveGEN: Generalized and Robust 3D Detection in Driving via   Controllable Text-to-Image Diffusion Generation [49.32104127246474]
 DriveGEN is a training-free controllable Text-to-Image Diffusion Generation.
It consistently preserves objects with precise 3D geometry across diverse Out-of-Distribution generations.
 arXiv  Detail & Related papers  (2025-03-14T06:35:38Z)
- DiffuBox: Refining 3D Object Detection with Point Diffusion [74.01759893280774]
 We introduce a novel diffusion-based box refinement approach to ensure robust 3D object detection and localization.
We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets.
 arXiv  Detail & Related papers  (2024-05-25T03:14:55Z)
- VFMM3D: Releasing the Potential of Image by Vision Foundation Model for   Monocular 3D Object Detection [80.62052650370416]
 monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
 arXiv  Detail & Related papers  (2024-04-15T03:12:12Z)
- Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection [77.23918785277404]
 We present Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection.
Specifically, we include noises to produce corrupted 3D object size and class label, distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs.
We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods.
 arXiv  Detail & Related papers  (2023-12-05T18:54:03Z)
- DiffRef3D: A Diffusion-based Proposal Refinement Framework for 3D Object
  Detection [15.149782382638485]
 We introduce a novel framework named DiffRef3D which adopts the diffusion process on 3D object detection with point clouds for the first time.
During training, DiffRef3D gradually adds noise to the residuals between proposals and target objects, then applies the noisy residuals to proposals to generate hypotheses.
The refinement module utilizes these hypotheses to denoise the noisy residuals and generate accurate box predictions.
 arXiv  Detail & Related papers  (2023-10-25T04:17:13Z)
- Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
 Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance.
Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets.
In the inference stage, the model progressively refines a set of random boxes to the prediction results.
 arXiv  Detail & Related papers  (2023-09-05T08:49:53Z)
- Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
  Detection [58.81316192862618]
 Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
 fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
 arXiv  Detail & Related papers  (2022-05-30T09:35:37Z)
- Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
 Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
 Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
 arXiv  Detail & Related papers  (2020-08-31T17:10:48Z)
- Range Conditioned Dilated Convolutions for Scale Invariant 3D Object
  Detection [41.59388513615775]
 This paper presents a novel 3D object detection framework that processes LiDAR data directly on its native representation: range images.
Benefiting from the compactness of range images, 2D convolutions can efficiently process dense LiDAR data of a scene.
 arXiv  Detail & Related papers  (2020-05-20T09:24:43Z)
- 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View
  Spatial Feature Fusion for 3D Object Detection [10.507404260449333]
 We propose a new architecture for fusing camera and LiDAR sensors for 3D object detection.
The proposed 3D-CVF achieves state-of-the-art performance in the KITTI benchmark.
 arXiv  Detail & Related papers  (2020-04-27T08:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.