Diffusion-based 3D Object Detection with Random Boxes
- URL: http://arxiv.org/abs/2309.02049v1
- Date: Tue, 5 Sep 2023 08:49:53 GMT
- Title: Diffusion-based 3D Object Detection with Random Boxes
- Authors: Xin Zhou, Jinghua Hou, Tingting Yao, Dingkang Liang, Zhe Liu, Zhikang
Zou, Xiaoqing Ye, Jianwei Cheng, Xiang Bai
- Abstract summary: Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance.
Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets.
In the inference stage, the model progressively refines a set of random boxes to the prediction results.
- Score: 58.43022365393569
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: 3D object detection is an essential task for achieving autonomous driving.
Existing anchor-based detection methods rely on empirical heuristics setting of
anchors, which makes the algorithms lack elegance. In recent years, we have
witnessed the rise of several generative models, among which diffusion models
show great potential for learning the transformation of two distributions. Our
proposed Diff3Det migrates the diffusion model to proposal generation for 3D
object detection by considering the detection boxes as generative targets.
During training, the object boxes diffuse from the ground truth boxes to the
Gaussian distribution, and the decoder learns to reverse this noise process. In
the inference stage, the model progressively refines a set of random boxes to
the prediction results. We provide detailed experiments on the KITTI benchmark
and achieve promising performance compared to classical anchor-based 3D
detection methods.
Related papers
- DC3DO: Diffusion Classifier for 3D Objects [3.265023962374139]
Inspired by Geoffrey Hinton emphasis on generative modeling, we explore the use of 3D diffusion models for object classification.
Our approach, the Diffusion for 3D Objects (DC3DO), enables zero-shot classification of 3D shapes without additional training.
arXiv Detail & Related papers (2024-08-13T07:35:56Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection [77.23918785277404]
We present Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection.
Specifically, we include noises to produce corrupted 3D object size and class label, distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs.
We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods.
arXiv Detail & Related papers (2023-12-05T18:54:03Z) - 3DifFusionDet: Diffusion Model for 3D Object Detection with Robust
LiDAR-Camera Fusion [6.914463996768285]
3DifFusionDet structures 3D object detection as a denoising diffusion process from noisy 3D boxes to target boxes.
Under the feature align strategy, the progressive refinement method could make a significant contribution to robust LiDAR-Camera fusion.
Experiments on KITTI, a benchmark for real-world traffic object identification, revealed that 3DifFusionDet is able to perform favorably in comparison to earlier, well-respected detectors.
arXiv Detail & Related papers (2023-11-07T05:53:09Z) - DiffRef3D: A Diffusion-based Proposal Refinement Framework for 3D Object
Detection [15.149782382638485]
We introduce a novel framework named DiffRef3D which adopts the diffusion process on 3D object detection with point clouds for the first time.
During training, DiffRef3D gradually adds noise to the residuals between proposals and target objects, then applies the noisy residuals to proposals to generate hypotheses.
The refinement module utilizes these hypotheses to denoise the noisy residuals and generate accurate box predictions.
arXiv Detail & Related papers (2023-10-25T04:17:13Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D
Generation [28.25023686484727]
A diffusion model learns to predict a vector field of gradients.
We propose a chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable field.
We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.
arXiv Detail & Related papers (2022-12-01T18:56:37Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.