DPBridge: Latent Diffusion Bridge for Dense Prediction
- URL: http://arxiv.org/abs/2412.20506v3
- Date: Mon, 19 May 2025 13:15:25 GMT
- Title: DPBridge: Latent Diffusion Bridge for Dense Prediction
- Authors: Haorui Ji, Taojun Lin, Hongdong Li,
- Abstract summary: We introduce DPBridge, the first latent diffusion bridge framework for dense prediction tasks.<n>Our method consistently achieves superior performance, demonstrating its effectiveness and capability generalization under different scenarios.
- Score: 49.1574468325115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models demonstrate remarkable capabilities in capturing complex data distributions and have achieved compelling results in many generative tasks. While they have recently been extended to dense prediction tasks such as depth estimation and surface normal prediction, their full potential in this area remains under-explored. In dense prediction settings, target signal maps and input images are pixel-wise aligned. This makes conventional noise-to-data generation paradigm inefficient, as input images can serve as more informative prior compared to pure noise. Diffusion bridge models, which support data-to-data generation between two general data distributions, offer a promising alternative, but they typically fail to exploit the rich visual priors embedded in large pretrained foundation models. To address these limitations, we integrate diffusion bridge formulation with structured visual priors and introduce DPBridge, the first latent diffusion bridge framework for dense prediction tasks. Our method presents three key contributions: (1) a tractable reverse transition kernel for diffusion bridge process, enabling maximum likelihood training scheme for better compatibility with pretrained backbones; (2) a distribution-aligned normalization technique to mitigate the discrepancies between the bridge and standard diffusion processes; and (3) an auxiliary image consistency loss to preserve fine-grained details. Experiments across extensive benchmarks validate that our method consistently achieves superior performance, demonstrating its effectiveness and generalization capability under different scenarios.
Related papers
- DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks [79.50756148780928]
This paper studies the problem of leveraging pretrained diffusion models for performing discriminative tasks.<n>We extend the discriminative capability of pretrained frozen generative diffusion models from the classification task to the more complex object detection task, by "inverting" a pretrained layout-to-image diffusion model.
arXiv Detail & Related papers (2025-04-24T05:13:27Z) - An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models [13.00429687431982]
Diffusion bridge models initialize the generative process from corrupted images instead of pure Gaussian noise.
Existing diffusion bridge models often rely on Differential Equation samplers, which result in slower inference speed.
We propose a high-order ODE sampler with a start for diffusion bridge models.
Our method is fully compatible with pretrained diffusion bridge models and requires no additional training.
arXiv Detail & Related papers (2024-12-28T03:32:26Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.<n>We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.<n>Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - Exploring the Design Space of Diffusion Bridge Models via Stochasticity Control [17.464174698465918]
Diffusion bridge models facilitate image-to-image (I2I) translation by connecting two distributions.
Existing methods overlook the impact of noise in SDEs, transition kernel, and the base distribution on sampling efficiency, image quality and diversity.
We propose a novel theoretical framework that extends the design space of diffusion bridges, and provides strategies to mitigate singularities during both training and sampling.
arXiv Detail & Related papers (2024-10-28T21:30:59Z) - A Wavelet Diffusion GAN for Image Super-Resolution [7.986370916847687]
Diffusion models have emerged as a superior alternative to generative adversarial networks (GANs) for high-fidelity image generation.
However, their real-time feasibility is hindered by slow training and inference speeds.
This study proposes a wavelet-based conditional Diffusion GAN scheme for Single-Image Super-Resolution.
arXiv Detail & Related papers (2024-10-23T15:34:06Z) - High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness.
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - Generative Edge Detection with Stable Diffusion [52.870631376660924]
Edge detection is typically viewed as a pixel-level classification problem mainly addressed by discriminative methods.
We propose a novel approach, named Generative Edge Detector (GED), by fully utilizing the potential of the pre-trained stable diffusion model.
We conduct extensive experiments on multiple datasets and achieve competitive performance.
arXiv Detail & Related papers (2024-10-04T01:52:23Z) - Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs [36.65594293655289]
DoSSR is a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models.<n>At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models.<n>Our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring only 5 sampling steps.
arXiv Detail & Related papers (2024-09-26T12:16:11Z) - PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage [19.02295657801464]
This work addresses the task of zero-shot monocular depth estimation.
A recent advance in this field has been the idea of utilising Text-to-Image foundation models, such as Stable Diffusion.
We present PrimeDepth, a method that is highly efficient at test time while keeping, or even enhancing, the positive aspects of diffusion-based approaches.
arXiv Detail & Related papers (2024-09-13T19:03:48Z) - Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities.
We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula.
Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Provably Robust Score-Based Diffusion Posterior Sampling for Plug-and-Play Image Reconstruction [31.503662384666274]
In science and engineering, the goal is to infer an unknown image from a small number of measurements collected from a known forward model describing certain imaging modality.
Motivated Score-based diffusion models, due to its empirical success, have emerged as an impressive candidate of an exemplary prior in image reconstruction.
arXiv Detail & Related papers (2024-03-25T15:58:26Z) - DepthFM: Fast Monocular Depth Estimation with Flow Matching [22.206355073676082]
Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport.
Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions.
Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.
arXiv Detail & Related papers (2024-03-20T17:51:53Z) - Exploiting Diffusion Prior for Generalizable Dense Prediction [85.4563592053464]
Recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate.
We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks.
Despite limited-domain training data, the approach yields faithful estimations for arbitrary images, surpassing existing state-of-the-art algorithms.
arXiv Detail & Related papers (2023-11-30T18:59:44Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - ACDMSR: Accelerated Conditional Diffusion Models for Single Image
Super-Resolution [84.73658185158222]
We propose a diffusion model-based super-resolution method called ACDMSR.
Our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process.
Our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
arXiv Detail & Related papers (2023-07-03T06:49:04Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Diffusion Model for Dense Matching [34.13580888014]
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term.
We propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms.
Our experimental results demonstrate significant performance improvements of our method over existing approaches.
arXiv Detail & Related papers (2023-05-30T14:58:24Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.