DiffRef3D: A Diffusion-based Proposal Refinement Framework for 3D Object
Detection
- URL: http://arxiv.org/abs/2310.16349v1
- Date: Wed, 25 Oct 2023 04:17:13 GMT
- Title: DiffRef3D: A Diffusion-based Proposal Refinement Framework for 3D Object
Detection
- Authors: Se-Ho Kim, Inyong Koo, Inyoung Lee, Byeongjun Park, Changick Kim
- Abstract summary: We introduce a novel framework named DiffRef3D which adopts the diffusion process on 3D object detection with point clouds for the first time.
During training, DiffRef3D gradually adds noise to the residuals between proposals and target objects, then applies the noisy residuals to proposals to generate hypotheses.
The refinement module utilizes these hypotheses to denoise the noisy residuals and generate accurate box predictions.
- Score: 15.149782382638485
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Denoising diffusion models show remarkable performances in generative tasks,
and their potential applications in perception tasks are gaining interest. In
this paper, we introduce a novel framework named DiffRef3D which adopts the
diffusion process on 3D object detection with point clouds for the first time.
Specifically, we formulate the proposal refinement stage of two-stage 3D object
detectors as a conditional diffusion process. During training, DiffRef3D
gradually adds noise to the residuals between proposals and target objects,
then applies the noisy residuals to proposals to generate hypotheses. The
refinement module utilizes these hypotheses to denoise the noisy residuals and
generate accurate box predictions. In the inference phase, DiffRef3D generates
initial hypotheses by sampling noise from a Gaussian distribution as residuals
and refines the hypotheses through iterative steps. DiffRef3D is a versatile
proposal refinement framework that consistently improves the performance of
existing 3D object detection models. We demonstrate the significance of
DiffRef3D through extensive experiments on the KITTI benchmark. Code will be
available.
Related papers
- Diff3DETR:Agent-based Diffusion Model for Semi-supervised 3D Object Detection [33.58208166717537]
3D object detection is essential for understanding 3D scenes.
Recent developments in semi-supervised methods seek to mitigate this problem by employing a teacher-student framework to generate pseudo-labels for unlabeled point clouds.
We introduce an Agent-based Diffusion Model for Semi-supervised 3D Object Detection (Diff3DETR)
arXiv Detail & Related papers (2024-08-01T05:04:22Z) - 3D Object Detection from Point Cloud via Voting Step Diffusion [52.9966883689137]
existing voting-based methods often receive votes from the partial surfaces of individual objects together with severe noises, leading to sub-optimal detection performance.
We propose a new method to move random 3D points toward the high-density region of the distribution by estimating the score function of the distribution with a noise conditioned score network.
Experiments on two large scale indoor 3D scene datasets, SUN RGB-D and ScanNet V2, demonstrate the superiority of our proposed method.
arXiv Detail & Related papers (2024-03-21T05:04:52Z) - D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose
Refinement [3.514184876338779]
A Diffusion-based 3D Pose Refiner is proposed to refine the output of any existing 3D pose estimator.
We leverage the architecture of current diffusion models to convert the distribution of noisy 3D poses into ground truth 3D poses.
Experimental results demonstrate the proposed architecture can significantly improve the performance of current sequence-to-sequence 3D pose estimators.
arXiv Detail & Related papers (2024-01-08T14:21:02Z) - 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation [16.242361975225066]
Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy.
We propose a novel diffusion-based framework to handle the noise and indeterminacy in object pose estimation.
arXiv Detail & Related papers (2023-12-29T05:28:35Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - 3DifFusionDet: Diffusion Model for 3D Object Detection with Robust
LiDAR-Camera Fusion [6.914463996768285]
3DifFusionDet structures 3D object detection as a denoising diffusion process from noisy 3D boxes to target boxes.
Under the feature align strategy, the progressive refinement method could make a significant contribution to robust LiDAR-Camera fusion.
Experiments on KITTI, a benchmark for real-world traffic object identification, revealed that 3DifFusionDet is able to perform favorably in comparison to earlier, well-respected detectors.
arXiv Detail & Related papers (2023-11-07T05:53:09Z) - SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D
Object Pose Estimation [66.16525145765604]
We introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios.
Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud.
Experiments demonstrate that our diffusion registration framework presents outstanding pose estimation performance on the real-world TUD-L, LINEMOD, and Occluded-LINEMOD datasets.
arXiv Detail & Related papers (2023-10-26T12:47:26Z) - Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance.
Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets.
In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z) - DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion [137.8749239614528]
We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD.
Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video.
arXiv Detail & Related papers (2023-03-27T00:40:52Z) - Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion
Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation.
We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.