3DifFusionDet: Diffusion Model for 3D Object Detection with Robust
LiDAR-Camera Fusion
- URL: http://arxiv.org/abs/2311.03742v1
- Date: Tue, 7 Nov 2023 05:53:09 GMT
- Title: 3DifFusionDet: Diffusion Model for 3D Object Detection with Robust
LiDAR-Camera Fusion
- Authors: Xinhao Xiang, Simon Dr\"ager, Jiawei Zhang
- Abstract summary: 3DifFusionDet structures 3D object detection as a denoising diffusion process from noisy 3D boxes to target boxes.
Under the feature align strategy, the progressive refinement method could make a significant contribution to robust LiDAR-Camera fusion.
Experiments on KITTI, a benchmark for real-world traffic object identification, revealed that 3DifFusionDet is able to perform favorably in comparison to earlier, well-respected detectors.
- Score: 6.914463996768285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Good 3D object detection performance from LiDAR-Camera sensors demands
seamless feature alignment and fusion strategies. We propose the 3DifFusionDet
framework in this paper, which structures 3D object detection as a denoising
diffusion process from noisy 3D boxes to target boxes. In this framework,
ground truth boxes diffuse in a random distribution for training, and the model
learns to reverse the noising process. During inference, the model gradually
refines a set of boxes that were generated at random to the outcomes. Under the
feature align strategy, the progressive refinement method could make a
significant contribution to robust LiDAR-Camera fusion. The iterative
refinement process could also demonstrate great adaptability by applying the
framework to various detecting circumstances where varying levels of accuracy
and speed are required. Extensive experiments on KITTI, a benchmark for
real-world traffic object identification, revealed that 3DifFusionDet is able
to perform favorably in comparison to earlier, well-respected detectors.
Related papers
- DiffuBox: Refining 3D Object Detection with Point Diffusion [74.01759893280774]
We introduce a novel diffusion-based box refinement approach to ensure robust 3D object detection and localization.
We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets.
arXiv Detail & Related papers (2024-05-25T03:14:55Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection [77.23918785277404]
We present Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection.
Specifically, we include noises to produce corrupted 3D object size and class label, distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs.
We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods.
arXiv Detail & Related papers (2023-12-05T18:54:03Z) - DiffRef3D: A Diffusion-based Proposal Refinement Framework for 3D Object
Detection [15.149782382638485]
We introduce a novel framework named DiffRef3D which adopts the diffusion process on 3D object detection with point clouds for the first time.
During training, DiffRef3D gradually adds noise to the residuals between proposals and target objects, then applies the noisy residuals to proposals to generate hypotheses.
The refinement module utilizes these hypotheses to denoise the noisy residuals and generate accurate box predictions.
arXiv Detail & Related papers (2023-10-25T04:17:13Z) - Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance.
Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets.
In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Range Conditioned Dilated Convolutions for Scale Invariant 3D Object
Detection [41.59388513615775]
This paper presents a novel 3D object detection framework that processes LiDAR data directly on its native representation: range images.
Benefiting from the compactness of range images, 2D convolutions can efficiently process dense LiDAR data of a scene.
arXiv Detail & Related papers (2020-05-20T09:24:43Z) - 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View
Spatial Feature Fusion for 3D Object Detection [10.507404260449333]
We propose a new architecture for fusing camera and LiDAR sensors for 3D object detection.
The proposed 3D-CVF achieves state-of-the-art performance in the KITTI benchmark.
arXiv Detail & Related papers (2020-04-27T08:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.