DDP: Diffusion Model for Dense Visual Prediction
- URL: http://arxiv.org/abs/2303.17559v2
- Date: Sat, 13 May 2023 11:38:59 GMT
- Title: DDP: Diffusion Model for Dense Visual Prediction
- Authors: Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang
Liu, Tong Lu, Zhenguo Li, Ping Luo
- Abstract summary: We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.
The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline.
DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods.
- Score: 71.55770562024782
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a simple, efficient, yet powerful framework for dense visual
predictions based on the conditional diffusion pipeline. Our approach follows a
"noise-to-map" generative paradigm for prediction by progressively removing
noise from a random Gaussian distribution, guided by the image. The method,
called DDP, efficiently extends the denoising diffusion process into the modern
perception pipeline. Without task-specific design and architecture
customization, DDP is easy to generalize to most dense prediction tasks, e.g.,
semantic segmentation and depth estimation. In addition, DDP shows attractive
properties such as dynamic inference and uncertainty awareness, in contrast to
previous single-step discriminative methods. We show top results on three
representative tasks with six diverse benchmarks, without tricks, DDP achieves
state-of-the-art or competitive performance on each task compared to the
specialist counterparts. For example, semantic segmentation (83.9 mIoU on
Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation
(0.05 REL on KITTI). We hope that our approach will serve as a solid baseline
and facilitate future research
Related papers
- PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage [19.02295657801464]
This work addresses the task of zero-shot monocular depth estimation.
A recent advance in this field has been the idea of utilising Text-to-Image foundation models, such as Stable Diffusion.
We present PrimeDepth, a method that is highly efficient at test time while keeping, or even enhancing, the positive aspects of diffusion-based approaches.
arXiv Detail & Related papers (2024-09-13T19:03:48Z) - OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving [15.331332063879342]
OccGen is a simple yet powerful generative perception model for the task of 3D semantic occupancy prediction.
OccGen adopts a ''noise-to-occupancy'' generative paradigm, progressively inferring and refining the occupancy map.
A key insight of this generative pipeline is that the diffusion denoising process is naturally able to model the coarse-to-fine refinement of the dense 3D occupancy map.
arXiv Detail & Related papers (2024-04-23T13:20:09Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Exploiting Diffusion Prior for Generalizable Dense Prediction [85.4563592053464]
Recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate.
We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks.
Despite limited-domain training data, the approach yields faithful estimations for arbitrary images, surpassing existing state-of-the-art algorithms.
arXiv Detail & Related papers (2023-11-30T18:59:44Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium
Segmentation [0.0]
We present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views.
Our method has achieved a competitive result over the state-of-the-art semisupervised methods on the Atrial Challenge dataset.
arXiv Detail & Related papers (2021-09-20T14:51:42Z) - Crowd Counting via Perspective-Guided Fractional-Dilation Convolution [75.36662947203192]
This paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet)
By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations.
It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales.
arXiv Detail & Related papers (2021-07-08T07:57:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.